You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by centerqi hu <ce...@gmail.com> on 2013/09/14 09:01:52 UTC

Whether this is a bug of count function

The sample.txt file content:

android,u1,taobao1
android,u1,taobao1
,u2,taobao2

RR = LOAD '/user/www/udc/output/bugfind/sample.txt' USING PigStorage(',')
as (platform, machineID,  productID);
RB = GROUP RR BY (productID);
RES = FOREACH RB{
                ITEMUV = DISTINCT RR.machineID;
                GENERATE flatten(group),COUNT(ITEMUV) AS UV,COUNT(RR) AS PV;
};
DUMP RES;

OUTPUT:

(taobao1,1,2)
(taobao2,1,0)

Why taobao2 the pv is 0, but uv  is 1?

I view? the source code of the COUNT function

If the first column is null, cnt will not increase

  while (it.hasNext()){
                    Tuple t = (Tuple)it.next();
                    if (t != null && t.size() > 0 && t.get(0) != null )
                            cnt++;
            }

-- 
centerqi@gmail.com|齐忠

Re: Whether this is a bug of count function

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
That's actually the documented behavior:
https://pig.apache.org/docs/r0.10.0/func.html#count

There was some discussion about changing this:
https://issues.apache.org/jira/browse/PIG-1014

Patches gratefully accepted..

D


On Sat, Sep 14, 2013 at 12:01 AM, centerqi hu <ce...@gmail.com> wrote:

> The sample.txt file content:
>
> android,u1,taobao1
> android,u1,taobao1
> ,u2,taobao2
>
> RR = LOAD '/user/www/udc/output/bugfind/sample.txt' USING PigStorage(',')
> as (platform, machineID,  productID);
> RB = GROUP RR BY (productID);
> RES = FOREACH RB{
>                 ITEMUV = DISTINCT RR.machineID;
>                 GENERATE flatten(group),COUNT(ITEMUV) AS UV,COUNT(RR) AS
> PV;
> };
> DUMP RES;
>
> OUTPUT:
>
> (taobao1,1,2)
> (taobao2,1,0)
>
> Why taobao2 the pv is 0, but uv  is 1?
>
> I view? the source code of the COUNT function
>
> If the first column is null, cnt will not increase
>
>   while (it.hasNext()){
>                     Tuple t = (Tuple)it.next();
>                     if (t != null && t.size() > 0 && t.get(0) != null )
>                             cnt++;
>             }
>
> --
> centerqi@gmail.com|齐忠
>