You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Necati Demir <nd...@demir.web.tr> on 2012/03/24 00:23:57 UTC

wordcounts are not integer

Hello,

I am running seq2sparse command with the parameter -ng 2.
When I dump wordcount/ngrams/part-r-00000 file, I see that sum values are
not integers. How are n-gram values calculated in mahout?


-- 
Necati DEMİR
--------------------

Re: wordcounts are not integer

Posted by Lance Norskog <go...@gmail.com>.
They are not raw word counts. Instead they are processed using various
formulae. I don't know where these are articulated.

On Sun, Mar 25, 2012 at 1:01 PM, Necati Demir <nd...@demir.web.tr> wrote:
> You are right. I asked my question in a wrong way.
>
> I want to ask that some values are something like 25.5. How a wordcount can
> have 0.5 value? You can see a part of this file below:
>
> Key: 108 1 1: Value: 241.7667508731829
> Key: 108 4: Value: 8.554995151411276
> Key: 108 4 during: Value: 25.260550610371865
> Key: 108 billion: Value: 20.98225432772597
> Key: 108 kg: Value: 24.666483410952424
> Key: 108 kg a4: Value: 44.2003664152453
>
>
>
>
> On 25 March 2012 02:59, Lance Norskog <go...@gmail.com> wrote:
>
>> The counts are doubles. Vectors in Mahout are always doubles.
>>
>> On Fri, Mar 23, 2012 at 4:23 PM, Necati Demir <nd...@demir.web.tr> wrote:
>> > Hello,
>> >
>> > I am running seq2sparse command with the parameter -ng 2.
>> > When I dump wordcount/ngrams/part-r-00000 file, I see that sum values are
>> > not integers. How are n-gram values calculated in mahout?
>> >
>> >
>> > --
>> > Necati DEMİR
>> > --------------------
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>
>
>
> --
> Necati DEMİR
> --------------------



-- 
Lance Norskog
goksron@gmail.com

Re: wordcounts are not integer

Posted by Necati Demir <nd...@demir.web.tr>.
You are right. I asked my question in a wrong way.

I want to ask that some values are something like 25.5. How a wordcount can
have 0.5 value? You can see a part of this file below:

Key: 108 1 1: Value: 241.7667508731829
Key: 108 4: Value: 8.554995151411276
Key: 108 4 during: Value: 25.260550610371865
Key: 108 billion: Value: 20.98225432772597
Key: 108 kg: Value: 24.666483410952424
Key: 108 kg a4: Value: 44.2003664152453




On 25 March 2012 02:59, Lance Norskog <go...@gmail.com> wrote:

> The counts are doubles. Vectors in Mahout are always doubles.
>
> On Fri, Mar 23, 2012 at 4:23 PM, Necati Demir <nd...@demir.web.tr> wrote:
> > Hello,
> >
> > I am running seq2sparse command with the parameter -ng 2.
> > When I dump wordcount/ngrams/part-r-00000 file, I see that sum values are
> > not integers. How are n-gram values calculated in mahout?
> >
> >
> > --
> > Necati DEMİR
> > --------------------
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>



-- 
Necati DEMİR
--------------------

Re: wordcounts are not integer

Posted by Lance Norskog <go...@gmail.com>.
The counts are doubles. Vectors in Mahout are always doubles.

On Fri, Mar 23, 2012 at 4:23 PM, Necati Demir <nd...@demir.web.tr> wrote:
> Hello,
>
> I am running seq2sparse command with the parameter -ng 2.
> When I dump wordcount/ngrams/part-r-00000 file, I see that sum values are
> not integers. How are n-gram values calculated in mahout?
>
>
> --
> Necati DEMİR
> --------------------



-- 
Lance Norskog
goksron@gmail.com