You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "Ankur Desai (ankurdes)" <an...@cisco.com> on 2015/12/02 21:38:36 UTC
Mahout Collocation parameter
Hi,
I am running collocation on mahout and are having trouble understanding what the minsupport parameter is doing.
I want to get the bigrams/trigrams that occur at least 5 times in the corpus. I set the minsupport value to 5 and I am still getting results that are occur only one time in the entire corpus.
Can someone please help me understand what this parameter is for or how I can get bigrams/trigrams that occur at least X number of times.
Thanks,
Ankur
Re: Mahout Collocation parameter
Posted by JunTai Gong <go...@gmail.com>.
Hi,
The parameter 'unigram' may be what you want.
--unigram (-u) If set, unigrams will be emitted inthe
final output alongside collocations
https://mahout.apache.org/users/basics/collocations.html
Joe
2015-12-03 4:38 GMT+08:00 Ankur Desai (ankurdes) <an...@cisco.com>:
> Hi,
>
> I am running collocation on mahout and are having trouble understanding
> what the minsupport parameter is doing.
>
> I want to get the bigrams/trigrams that occur at least 5 times in the
> corpus. I set the minsupport value to 5 and I am still getting results
> that are occur only one time in the entire corpus.
>
> Can someone please help me understand what this parameter is for or how I
> can get bigrams/trigrams that occur at least X number of times.
>
> Thanks,
> Ankur
>