You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by "ingvay7@yahoo.com" <in...@yahoo.com> on 2012/11/13 16:36:35 UTC

Help with Script

hey all,

Very new Pig user here. I think I'm trying to get something very simple done but getting a few errors. See me script below.Any guidance will be appreciated.Thanks.

I get errors such as  Error during parsing. Invalid alias: serverin {time: double,count: double}
I am basically trying to duplicate the following SQL query:

select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt) 
from TableA 
group by 1, 2, 3;



My script is as follows:

a = LOAD 'Report' AS (
		dt:chararray,
		Server:chararray,
		Type:chararray,
		Ops:chararray,
		UserID:chararray,
		U_cnt:int,
		U_tm:int,
		U_min_tm:int,
		U_max_tm:int,
		U_avg_tm:float,
		);


--Remove Test Servers
remtest = filter a by not Server matches 'Test%';
-- Filter to required columns
reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
--Groupby 
G2 = group reqd by Server,Type,Ops;
--Sum the User Counts and Times 
G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
--byServeroperation = order G3 by Server;
store G3 into 'Servertest';

ingvay7

Re: Fw: Help with Script

Posted by pablomar <pa...@gmail.com>.
what about something like this ?

G3 = foreach G2 generate group,SUM(reqd.U_tm)as time,SUM(reqd.U_cnt)as
count;



On Tue, Nov 13, 2012 at 12:57 PM, ingvay7 <in...@yahoo.com> wrote:

> (Apologies for resending but corrected script below)
>
>
> This is the error I got:
>
> ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the
> matching function for org.apache.pig.builtin.SUM as multiple or none of
> them fit. Please use an explicit cast.
>
> Updated code:
>
> a = LOAD 'Report' AS (
>         dt:chararray,
>         Server:chararray,
>         Type:chararray,
>         Ops:chararray,
>         UserID:chararray,
>         U_cnt:int,
>         U_tm:int,
>         U_min_tm:int,
>         U_max_tm:int,
>         U_avg_tm:float
>         );
>
>
> --Remove Test Servers
> remtest = filter a by not Server matches 'Test%';
>
> -- Filter to required columns
> reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
>
> --Groupby
> G2 = group reqd by (Server,Type,Ops);
>
> --Sum the User Counts and Times
> G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
>
> store G3 into 'Servertest';
>
>
>
>
>
>
>
> ----- Original Message -----
> From: Prashant Kommireddi <pr...@gmail.com>
> To: "user@pig.apache.org" <us...@pig.apache.org>
> Cc:
> Sent: Tuesday, November 13, 2012 11:59 AM
> Subject: Re: Help with Script
>
> Hi,
>
> Can you paste the error message here?
>
> Sent from my iPhone
>
> On Nov 13, 2012, at 8:34 AM, "ingvay7@yahoo.com" <in...@yahoo.com>
> wrote:
>
> > hey all,
> >
> > Very new Pig user here. I think I'm trying to get something very simple
> done but getting a few errors. See me script below.Any guidance will be
> appreciated.Thanks.
> >
> > I get errors such as  Error during parsing. Invalid alias: serverin
> {time: double,count: double}
> > I am basically trying to duplicate the following SQL query:
> >
> > select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
> > from TableA
> > group by 1, 2, 3;
> >
> >
> >
> > My script is as follows:
> >
> > a = LOAD 'Report' AS (
> >        dt:chararray,
> >        Server:chararray,
> >        Type:chararray,
> >        Ops:chararray,
> >        UserID:chararray,
> >        U_cnt:int,
> >        U_tm:int,
> >        U_min_tm:int,
> >        U_max_tm:int,
> >        U_avg_tm:float,
> >        );
> >
> >
> > --Remove Test Servers
> > remtest = filter a by not Server matches 'Test%';
> > -- Filter to required columns
> > reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
> > --Groupby
> > G2 = group reqd by Server,Type,Ops;
> > --Sum the User Counts and Times
> > G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
> > --byServeroperation = order G3 by Server;
> > store G3 into 'Servertest';
> >
> > ingvay7
>
>

Re: Help with Script

Posted by ingvay7 <in...@yahoo.com>.
Thanks, Prashant and Pablomar. That fixed it!




----- Original Message -----
From: Prashant Kommireddi <pr...@gmail.com>
To: "user@pig.apache.org" <us...@pig.apache.org>
Cc: 
Sent: Tuesday, November 13, 2012 1:40 PM
Subject: Re: Help with Script

SUM function requires that you specify the specific element from the
grouping. In this case, U_tm and U_cnt are both within group/bags and need
to be accessed as "reqd.U_tm" and "reqd.U_cnt".


--Sum the User Counts and Times
G3 = foreach G2 generate group,SUM(reqd.U_tm)as time,SUM(reqd.U_cnt)as
count;

On Nov 13, 2012, at 9:58 AM, ingvay7 <in...@yahoo.com> wrote:

(Apologies for resending but corrected script below)


This is the error I got:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the
matching function for org.apache.pig.builtin.SUM as multiple or none of
them fit. Please use an explicit cast.

Updated code:

a = LOAD 'Report' AS (
        dt:chararray,
        Server:chararray,
        Type:chararray,
        Ops:chararray,
        UserID:chararray,
        U_cnt:int,
        U_tm:int,
        U_min_tm:int,
        U_max_tm:int,
        U_avg_tm:float
        );


--Remove Test Servers
remtest = filter a by not Server matches 'Test%';

-- Filter to required columns
reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;

--Groupby
G2 = group reqd by (Server,Type,Ops);

--Sum the User Counts and Times
G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;

store G3 into 'Servertest';







----- Original Message -----
From: Prashant Kommireddi <pr...@gmail.com>
To: "user@pig.apache.org" <us...@pig.apache.org>
Cc:
Sent: Tuesday, November 13, 2012 11:59 AM
Subject: Re: Help with Script

Hi,

Can you paste the error message here?

Sent from my iPhone

On Nov 13, 2012, at 8:34 AM, "ingvay7@yahoo.com" <in...@yahoo.com> wrote:

hey all,


Very new Pig user here. I think I'm trying to get something very simple
done but getting a few errors. See me script below.Any guidance will be
appreciated.Thanks.


I get errors such as  Error during parsing. Invalid alias: serverin {time:
double,count: double}

I am basically trying to duplicate the following SQL query:


select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)

from TableA

group by 1, 2, 3;




My script is as follows:


a = LOAD 'Report' AS (

        dt:chararray,

        Server:chararray,

        Type:chararray,

        Ops:chararray,

        UserID:chararray,

        U_cnt:int,

        U_tm:int,

        U_min_tm:int,

        U_max_tm:int,

        U_avg_tm:float,

        );



--Remove Test Servers

remtest = filter a by not Server matches 'Test%';

-- Filter to required columns

reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;

--Groupby

G2 = group reqd by Server,Type,Ops;

--Sum the User Counts and Times

G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;

--byServeroperation = order G3 by Server;

store G3 into 'Servertest';


ingvay7


Re: Help with Script

Posted by Prashant Kommireddi <pr...@gmail.com>.
SUM function requires that you specify the specific element from the
grouping. In this case, U_tm and U_cnt are both within group/bags and need
to be accessed as "reqd.U_tm" and "reqd.U_cnt".


--Sum the User Counts and Times
G3 = foreach G2 generate group,SUM(reqd.U_tm)as time,SUM(reqd.U_cnt)as
count;

On Nov 13, 2012, at 9:58 AM, ingvay7 <in...@yahoo.com> wrote:

(Apologies for resending but corrected script below)


This is the error I got:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the
matching function for org.apache.pig.builtin.SUM as multiple or none of
them fit. Please use an explicit cast.

Updated code:

a = LOAD 'Report' AS (
        dt:chararray,
        Server:chararray,
        Type:chararray,
        Ops:chararray,
        UserID:chararray,
        U_cnt:int,
        U_tm:int,
        U_min_tm:int,
        U_max_tm:int,
        U_avg_tm:float
        );


--Remove Test Servers
remtest = filter a by not Server matches 'Test%';

-- Filter to required columns
reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;

--Groupby
G2 = group reqd by (Server,Type,Ops);

--Sum the User Counts and Times
G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;

store G3 into 'Servertest';







----- Original Message -----
From: Prashant Kommireddi <pr...@gmail.com>
To: "user@pig.apache.org" <us...@pig.apache.org>
Cc:
Sent: Tuesday, November 13, 2012 11:59 AM
Subject: Re: Help with Script

Hi,

Can you paste the error message here?

Sent from my iPhone

On Nov 13, 2012, at 8:34 AM, "ingvay7@yahoo.com" <in...@yahoo.com> wrote:

hey all,


Very new Pig user here. I think I'm trying to get something very simple
done but getting a few errors. See me script below.Any guidance will be
appreciated.Thanks.


I get errors such as  Error during parsing. Invalid alias: serverin {time:
double,count: double}

I am basically trying to duplicate the following SQL query:


select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)

from TableA

group by 1, 2, 3;




My script is as follows:


a = LOAD 'Report' AS (

        dt:chararray,

        Server:chararray,

        Type:chararray,

        Ops:chararray,

        UserID:chararray,

        U_cnt:int,

        U_tm:int,

        U_min_tm:int,

        U_max_tm:int,

        U_avg_tm:float,

        );



--Remove Test Servers

remtest = filter a by not Server matches 'Test%';

-- Filter to required columns

reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;

--Groupby

G2 = group reqd by Server,Type,Ops;

--Sum the User Counts and Times

G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;

--byServeroperation = order G3 by Server;

store G3 into 'Servertest';


ingvay7

Fw: Help with Script

Posted by ingvay7 <in...@yahoo.com>.
(Apologies for resending but corrected script below)


This is the error I got:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.

Updated code:

a = LOAD 'Report' AS (
        dt:chararray,
        Server:chararray,
        Type:chararray,
        Ops:chararray,
        UserID:chararray,
        U_cnt:int,
        U_tm:int,
        U_min_tm:int,
        U_max_tm:int,
        U_avg_tm:float
        );


--Remove Test Servers
remtest = filter a by not Server matches 'Test%';

-- Filter to required columns
reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;

--Groupby 
G2 = group reqd by (Server,Type,Ops);

--Sum the User Counts and Times 
G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;

store G3 into 'Servertest';







----- Original Message -----
From: Prashant Kommireddi <pr...@gmail.com>
To: "user@pig.apache.org" <us...@pig.apache.org>
Cc: 
Sent: Tuesday, November 13, 2012 11:59 AM
Subject: Re: Help with Script

Hi,

Can you paste the error message here?

Sent from my iPhone

On Nov 13, 2012, at 8:34 AM, "ingvay7@yahoo.com" <in...@yahoo.com> wrote:

> hey all,
>
> Very new Pig user here. I think I'm trying to get something very simple done but getting a few errors. See me script below.Any guidance will be appreciated.Thanks.
>
> I get errors such as  Error during parsing. Invalid alias: serverin {time: double,count: double}
> I am basically trying to duplicate the following SQL query:
>
> select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
> from TableA
> group by 1, 2, 3;
>
>
>
> My script is as follows:
>
> a = LOAD 'Report' AS (
>        dt:chararray,
>        Server:chararray,
>        Type:chararray,
>        Ops:chararray,
>        UserID:chararray,
>        U_cnt:int,
>        U_tm:int,
>        U_min_tm:int,
>        U_max_tm:int,
>        U_avg_tm:float,
>        );
>
>
> --Remove Test Servers
> remtest = filter a by not Server matches 'Test%';
> -- Filter to required columns
> reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
> --Groupby
> G2 = group reqd by Server,Type,Ops;
> --Sum the User Counts and Times
> G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
> --byServeroperation = order G3 by Server;
> store G3 into 'Servertest';
>
> ingvay7


Re: Help with Script

Posted by Vishwanath <in...@yahoo.com>.
This is the error I got:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.

Updated code:

a = LOAD 'Report' AS (
        dt:chararray,
        Server:chararray,
        Type:chararray,
        Ops:chararray,
        UserID:chararray,
        U_cnt:int,
        U_tm:int,
        U_min_tm:int,
        U_max_tm:int,
        U_avg_tm:float
        );


--Remove Test Servers
remtest = filter a by not Server matches 'Test%';

-- Filter to required columns
reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;

--Groupby 
G2 = group reqd by (Server,Type,Ops);

--Sum the User Counts and Times 
G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;

store G3 into 'Servertest';







----- Original Message -----
From: Prashant Kommireddi <pr...@gmail.com>
To: "user@pig.apache.org" <us...@pig.apache.org>
Cc: 
Sent: Tuesday, November 13, 2012 11:59 AM
Subject: Re: Help with Script

Hi,

Can you paste the error message here?

Sent from my iPhone

On Nov 13, 2012, at 8:34 AM, "ingvay7@yahoo.com" <in...@yahoo.com> wrote:

> hey all,
>
> Very new Pig user here. I think I'm trying to get something very simple done but getting a few errors. See me script below.Any guidance will be appreciated.Thanks.
>
> I get errors such as  Error during parsing. Invalid alias: serverin {time: double,count: double}
> I am basically trying to duplicate the following SQL query:
>
> select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
> from TableA
> group by 1, 2, 3;
>
>
>
> My script is as follows:
>
> a = LOAD 'Report' AS (
>        dt:chararray,
>        Server:chararray,
>        Type:chararray,
>        Ops:chararray,
>        UserID:chararray,
>        U_cnt:int,
>        U_tm:int,
>        U_min_tm:int,
>        U_max_tm:int,
>        U_avg_tm:float,
>        );
>
>
> --Remove Test Servers
> remtest = filter a by not Server matches 'Test%';
> -- Filter to required columns
> reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
> --Groupby
> G2 = group reqd by Server,Type,Ops;
> --Sum the User Counts and Times
> G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
> --byServeroperation = order G3 by Server;
> store G3 into 'Servertest';
>
> ingvay7


Re: Help with Script

Posted by Prashant Kommireddi <pr...@gmail.com>.
Hi,

Can you paste the error message here?

Sent from my iPhone

On Nov 13, 2012, at 8:34 AM, "ingvay7@yahoo.com" <in...@yahoo.com> wrote:

> hey all,
>
> Very new Pig user here. I think I'm trying to get something very simple done but getting a few errors. See me script below.Any guidance will be appreciated.Thanks.
>
> I get errors such as  Error during parsing. Invalid alias: serverin {time: double,count: double}
> I am basically trying to duplicate the following SQL query:
>
> select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
> from TableA
> group by 1, 2, 3;
>
>
>
> My script is as follows:
>
> a = LOAD 'Report' AS (
>        dt:chararray,
>        Server:chararray,
>        Type:chararray,
>        Ops:chararray,
>        UserID:chararray,
>        U_cnt:int,
>        U_tm:int,
>        U_min_tm:int,
>        U_max_tm:int,
>        U_avg_tm:float,
>        );
>
>
> --Remove Test Servers
> remtest = filter a by not Server matches 'Test%';
> -- Filter to required columns
> reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
> --Groupby
> G2 = group reqd by Server,Type,Ops;
> --Sum the User Counts and Times
> G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
> --byServeroperation = order G3 by Server;
> store G3 into 'Servertest';
>
> ingvay7

Re: Help with Script

Posted by pablomar <pa...@gmail.com>.
just taking a quick look, I see a couple of errors:
1_ your LOAD hast one more comma. You need to delete the last one, after
U_avg_tm:float

2_ and then, the group by, I think you need parenthesis
G2 = group reqd by (Server,Type,Ops);

by the way, where is you alias serverin ?



On Tue, Nov 13, 2012 at 10:36 AM, ingvay7@yahoo.com <in...@yahoo.com>wrote:

> hey all,
>
> Very new Pig user here. I think I'm trying to get something very simple
> done but getting a few errors. See me script below.Any guidance will be
> appreciated.Thanks.
>
> I get errors such as  Error during parsing. Invalid alias: serverin {time:
> double,count: double}
> I am basically trying to duplicate the following SQL query:
>
> select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
> from TableA
> group by 1, 2, 3;
>
>
>
> My script is as follows:
>
> a = LOAD 'Report' AS (
>                 dt:chararray,
>                 Server:chararray,
>                 Type:chararray,
>                 Ops:chararray,
>                 UserID:chararray,
>                 U_cnt:int,
>                 U_tm:int,
>                 U_min_tm:int,
>                 U_max_tm:int,
>                 U_avg_tm:float,
>                 );
>
>
> --Remove Test Servers
> remtest = filter a by not Server matches 'Test%';
> -- Filter to required columns
> reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
> --Groupby
> G2 = group reqd by Server,Type,Ops;
> --Sum the User Counts and Times
> G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
> --byServeroperation = order G3 by Server;
> store G3 into 'Servertest';
>
> ingvay7