You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Necati Demir <nd...@demir.web.tr> on 2012/03/24 00:23:57 UTC
wordcounts are not integer
Hello,
I am running seq2sparse command with the parameter -ng 2.
When I dump wordcount/ngrams/part-r-00000 file, I see that sum values are
not integers. How are n-gram values calculated in mahout?
--
Necati DEMİR
--------------------
Re: wordcounts are not integer
Posted by Lance Norskog <go...@gmail.com>.
They are not raw word counts. Instead they are processed using various
formulae. I don't know where these are articulated.
On Sun, Mar 25, 2012 at 1:01 PM, Necati Demir <nd...@demir.web.tr> wrote:
> You are right. I asked my question in a wrong way.
>
> I want to ask that some values are something like 25.5. How a wordcount can
> have 0.5 value? You can see a part of this file below:
>
> Key: 108 1 1: Value: 241.7667508731829
> Key: 108 4: Value: 8.554995151411276
> Key: 108 4 during: Value: 25.260550610371865
> Key: 108 billion: Value: 20.98225432772597
> Key: 108 kg: Value: 24.666483410952424
> Key: 108 kg a4: Value: 44.2003664152453
>
>
>
>
> On 25 March 2012 02:59, Lance Norskog <go...@gmail.com> wrote:
>
>> The counts are doubles. Vectors in Mahout are always doubles.
>>
>> On Fri, Mar 23, 2012 at 4:23 PM, Necati Demir <nd...@demir.web.tr> wrote:
>> > Hello,
>> >
>> > I am running seq2sparse command with the parameter -ng 2.
>> > When I dump wordcount/ngrams/part-r-00000 file, I see that sum values are
>> > not integers. How are n-gram values calculated in mahout?
>> >
>> >
>> > --
>> > Necati DEMİR
>> > --------------------
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>
>
>
> --
> Necati DEMİR
> --------------------
--
Lance Norskog
goksron@gmail.com
Re: wordcounts are not integer
Posted by Necati Demir <nd...@demir.web.tr>.
You are right. I asked my question in a wrong way.
I want to ask that some values are something like 25.5. How a wordcount can
have 0.5 value? You can see a part of this file below:
Key: 108 1 1: Value: 241.7667508731829
Key: 108 4: Value: 8.554995151411276
Key: 108 4 during: Value: 25.260550610371865
Key: 108 billion: Value: 20.98225432772597
Key: 108 kg: Value: 24.666483410952424
Key: 108 kg a4: Value: 44.2003664152453
On 25 March 2012 02:59, Lance Norskog <go...@gmail.com> wrote:
> The counts are doubles. Vectors in Mahout are always doubles.
>
> On Fri, Mar 23, 2012 at 4:23 PM, Necati Demir <nd...@demir.web.tr> wrote:
> > Hello,
> >
> > I am running seq2sparse command with the parameter -ng 2.
> > When I dump wordcount/ngrams/part-r-00000 file, I see that sum values are
> > not integers. How are n-gram values calculated in mahout?
> >
> >
> > --
> > Necati DEMİR
> > --------------------
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>
--
Necati DEMİR
--------------------
Re: wordcounts are not integer
Posted by Lance Norskog <go...@gmail.com>.
The counts are doubles. Vectors in Mahout are always doubles.
On Fri, Mar 23, 2012 at 4:23 PM, Necati Demir <nd...@demir.web.tr> wrote:
> Hello,
>
> I am running seq2sparse command with the parameter -ng 2.
> When I dump wordcount/ngrams/part-r-00000 file, I see that sum values are
> not integers. How are n-gram values calculated in mahout?
>
>
> --
> Necati DEMİR
> --------------------
--
Lance Norskog
goksron@gmail.com