You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "Ankur Desai (ankurdes)" <an...@cisco.com> on 2015/12/02 21:38:36 UTC

Mahout Collocation parameter

Hi,

I am running collocation on mahout and are having trouble understanding what the minsupport parameter is doing.

I want to get the bigrams/trigrams that occur at least 5 times in the corpus.  I set the minsupport value to 5 and I am still getting results that are occur only one time in the entire corpus.

Can someone please help me understand what this parameter is for or how I can get bigrams/trigrams that occur at least X number of times.

Thanks,
Ankur

Re: Mahout Collocation parameter

Posted by JunTai Gong <go...@gmail.com>.
Hi,
The parameter 'unigram' may be what you want.

  --unigram (-u)              If set, unigrams will be emitted inthe
                      final output alongside collocations

https://mahout.apache.org/users/basics/collocations.html


Joe

2015-12-03 4:38 GMT+08:00 Ankur Desai (ankurdes) <an...@cisco.com>:

> Hi,
>
> I am running collocation on mahout and are having trouble understanding
> what the minsupport parameter is doing.
>
> I want to get the bigrams/trigrams that occur at least 5 times in the
> corpus.  I set the minsupport value to 5 and I am still getting results
> that are occur only one time in the entire corpus.
>
> Can someone please help me understand what this parameter is for or how I
> can get bigrams/trigrams that occur at least X number of times.
>
> Thanks,
> Ankur
>