You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by "ingvay7@yahoo.com" <in...@yahoo.com> on 2012/11/13 16:36:35 UTC
Help with Script
hey all,
Very new Pig user here. I think I'm trying to get something very simple done but getting a few errors. See me script below.Any guidance will be appreciated.Thanks.
I get errors such as Error during parsing. Invalid alias: serverin {time: double,count: double}
I am basically trying to duplicate the following SQL query:
select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
from TableA
group by 1, 2, 3;
My script is as follows:
a = LOAD 'Report' AS (
dt:chararray,
Server:chararray,
Type:chararray,
Ops:chararray,
UserID:chararray,
U_cnt:int,
U_tm:int,
U_min_tm:int,
U_max_tm:int,
U_avg_tm:float,
);
--Remove Test Servers
remtest = filter a by not Server matches 'Test%';
-- Filter to required columns
reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
--Groupby
G2 = group reqd by Server,Type,Ops;
--Sum the User Counts and Times
G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
--byServeroperation = order G3 by Server;
store G3 into 'Servertest';
ingvay7
Re: Fw: Help with Script
Posted by pablomar <pa...@gmail.com>.
what about something like this ?
G3 = foreach G2 generate group,SUM(reqd.U_tm)as time,SUM(reqd.U_cnt)as
count;
On Tue, Nov 13, 2012 at 12:57 PM, ingvay7 <in...@yahoo.com> wrote:
> (Apologies for resending but corrected script below)
>
>
> This is the error I got:
>
> ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the
> matching function for org.apache.pig.builtin.SUM as multiple or none of
> them fit. Please use an explicit cast.
>
> Updated code:
>
> a = LOAD 'Report' AS (
> dt:chararray,
> Server:chararray,
> Type:chararray,
> Ops:chararray,
> UserID:chararray,
> U_cnt:int,
> U_tm:int,
> U_min_tm:int,
> U_max_tm:int,
> U_avg_tm:float
> );
>
>
> --Remove Test Servers
> remtest = filter a by not Server matches 'Test%';
>
> -- Filter to required columns
> reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
>
> --Groupby
> G2 = group reqd by (Server,Type,Ops);
>
> --Sum the User Counts and Times
> G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
>
> store G3 into 'Servertest';
>
>
>
>
>
>
>
> ----- Original Message -----
> From: Prashant Kommireddi <pr...@gmail.com>
> To: "user@pig.apache.org" <us...@pig.apache.org>
> Cc:
> Sent: Tuesday, November 13, 2012 11:59 AM
> Subject: Re: Help with Script
>
> Hi,
>
> Can you paste the error message here?
>
> Sent from my iPhone
>
> On Nov 13, 2012, at 8:34 AM, "ingvay7@yahoo.com" <in...@yahoo.com>
> wrote:
>
> > hey all,
> >
> > Very new Pig user here. I think I'm trying to get something very simple
> done but getting a few errors. See me script below.Any guidance will be
> appreciated.Thanks.
> >
> > I get errors such as Error during parsing. Invalid alias: serverin
> {time: double,count: double}
> > I am basically trying to duplicate the following SQL query:
> >
> > select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
> > from TableA
> > group by 1, 2, 3;
> >
> >
> >
> > My script is as follows:
> >
> > a = LOAD 'Report' AS (
> > dt:chararray,
> > Server:chararray,
> > Type:chararray,
> > Ops:chararray,
> > UserID:chararray,
> > U_cnt:int,
> > U_tm:int,
> > U_min_tm:int,
> > U_max_tm:int,
> > U_avg_tm:float,
> > );
> >
> >
> > --Remove Test Servers
> > remtest = filter a by not Server matches 'Test%';
> > -- Filter to required columns
> > reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
> > --Groupby
> > G2 = group reqd by Server,Type,Ops;
> > --Sum the User Counts and Times
> > G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
> > --byServeroperation = order G3 by Server;
> > store G3 into 'Servertest';
> >
> > ingvay7
>
>
Re: Help with Script
Posted by ingvay7 <in...@yahoo.com>.
Thanks, Prashant and Pablomar. That fixed it!
----- Original Message -----
From: Prashant Kommireddi <pr...@gmail.com>
To: "user@pig.apache.org" <us...@pig.apache.org>
Cc:
Sent: Tuesday, November 13, 2012 1:40 PM
Subject: Re: Help with Script
SUM function requires that you specify the specific element from the
grouping. In this case, U_tm and U_cnt are both within group/bags and need
to be accessed as "reqd.U_tm" and "reqd.U_cnt".
--Sum the User Counts and Times
G3 = foreach G2 generate group,SUM(reqd.U_tm)as time,SUM(reqd.U_cnt)as
count;
On Nov 13, 2012, at 9:58 AM, ingvay7 <in...@yahoo.com> wrote:
(Apologies for resending but corrected script below)
This is the error I got:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the
matching function for org.apache.pig.builtin.SUM as multiple or none of
them fit. Please use an explicit cast.
Updated code:
a = LOAD 'Report' AS (
dt:chararray,
Server:chararray,
Type:chararray,
Ops:chararray,
UserID:chararray,
U_cnt:int,
U_tm:int,
U_min_tm:int,
U_max_tm:int,
U_avg_tm:float
);
--Remove Test Servers
remtest = filter a by not Server matches 'Test%';
-- Filter to required columns
reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
--Groupby
G2 = group reqd by (Server,Type,Ops);
--Sum the User Counts and Times
G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
store G3 into 'Servertest';
----- Original Message -----
From: Prashant Kommireddi <pr...@gmail.com>
To: "user@pig.apache.org" <us...@pig.apache.org>
Cc:
Sent: Tuesday, November 13, 2012 11:59 AM
Subject: Re: Help with Script
Hi,
Can you paste the error message here?
Sent from my iPhone
On Nov 13, 2012, at 8:34 AM, "ingvay7@yahoo.com" <in...@yahoo.com> wrote:
hey all,
Very new Pig user here. I think I'm trying to get something very simple
done but getting a few errors. See me script below.Any guidance will be
appreciated.Thanks.
I get errors such as Error during parsing. Invalid alias: serverin {time:
double,count: double}
I am basically trying to duplicate the following SQL query:
select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
from TableA
group by 1, 2, 3;
My script is as follows:
a = LOAD 'Report' AS (
dt:chararray,
Server:chararray,
Type:chararray,
Ops:chararray,
UserID:chararray,
U_cnt:int,
U_tm:int,
U_min_tm:int,
U_max_tm:int,
U_avg_tm:float,
);
--Remove Test Servers
remtest = filter a by not Server matches 'Test%';
-- Filter to required columns
reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
--Groupby
G2 = group reqd by Server,Type,Ops;
--Sum the User Counts and Times
G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
--byServeroperation = order G3 by Server;
store G3 into 'Servertest';
ingvay7
Re: Help with Script
Posted by Prashant Kommireddi <pr...@gmail.com>.
SUM function requires that you specify the specific element from the
grouping. In this case, U_tm and U_cnt are both within group/bags and need
to be accessed as "reqd.U_tm" and "reqd.U_cnt".
--Sum the User Counts and Times
G3 = foreach G2 generate group,SUM(reqd.U_tm)as time,SUM(reqd.U_cnt)as
count;
On Nov 13, 2012, at 9:58 AM, ingvay7 <in...@yahoo.com> wrote:
(Apologies for resending but corrected script below)
This is the error I got:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the
matching function for org.apache.pig.builtin.SUM as multiple or none of
them fit. Please use an explicit cast.
Updated code:
a = LOAD 'Report' AS (
dt:chararray,
Server:chararray,
Type:chararray,
Ops:chararray,
UserID:chararray,
U_cnt:int,
U_tm:int,
U_min_tm:int,
U_max_tm:int,
U_avg_tm:float
);
--Remove Test Servers
remtest = filter a by not Server matches 'Test%';
-- Filter to required columns
reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
--Groupby
G2 = group reqd by (Server,Type,Ops);
--Sum the User Counts and Times
G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
store G3 into 'Servertest';
----- Original Message -----
From: Prashant Kommireddi <pr...@gmail.com>
To: "user@pig.apache.org" <us...@pig.apache.org>
Cc:
Sent: Tuesday, November 13, 2012 11:59 AM
Subject: Re: Help with Script
Hi,
Can you paste the error message here?
Sent from my iPhone
On Nov 13, 2012, at 8:34 AM, "ingvay7@yahoo.com" <in...@yahoo.com> wrote:
hey all,
Very new Pig user here. I think I'm trying to get something very simple
done but getting a few errors. See me script below.Any guidance will be
appreciated.Thanks.
I get errors such as Error during parsing. Invalid alias: serverin {time:
double,count: double}
I am basically trying to duplicate the following SQL query:
select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
from TableA
group by 1, 2, 3;
My script is as follows:
a = LOAD 'Report' AS (
dt:chararray,
Server:chararray,
Type:chararray,
Ops:chararray,
UserID:chararray,
U_cnt:int,
U_tm:int,
U_min_tm:int,
U_max_tm:int,
U_avg_tm:float,
);
--Remove Test Servers
remtest = filter a by not Server matches 'Test%';
-- Filter to required columns
reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
--Groupby
G2 = group reqd by Server,Type,Ops;
--Sum the User Counts and Times
G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
--byServeroperation = order G3 by Server;
store G3 into 'Servertest';
ingvay7
Fw: Help with Script
Posted by ingvay7 <in...@yahoo.com>.
(Apologies for resending but corrected script below)
This is the error I got:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.
Updated code:
a = LOAD 'Report' AS (
dt:chararray,
Server:chararray,
Type:chararray,
Ops:chararray,
UserID:chararray,
U_cnt:int,
U_tm:int,
U_min_tm:int,
U_max_tm:int,
U_avg_tm:float
);
--Remove Test Servers
remtest = filter a by not Server matches 'Test%';
-- Filter to required columns
reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
--Groupby
G2 = group reqd by (Server,Type,Ops);
--Sum the User Counts and Times
G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
store G3 into 'Servertest';
----- Original Message -----
From: Prashant Kommireddi <pr...@gmail.com>
To: "user@pig.apache.org" <us...@pig.apache.org>
Cc:
Sent: Tuesday, November 13, 2012 11:59 AM
Subject: Re: Help with Script
Hi,
Can you paste the error message here?
Sent from my iPhone
On Nov 13, 2012, at 8:34 AM, "ingvay7@yahoo.com" <in...@yahoo.com> wrote:
> hey all,
>
> Very new Pig user here. I think I'm trying to get something very simple done but getting a few errors. See me script below.Any guidance will be appreciated.Thanks.
>
> I get errors such as Error during parsing. Invalid alias: serverin {time: double,count: double}
> I am basically trying to duplicate the following SQL query:
>
> select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
> from TableA
> group by 1, 2, 3;
>
>
>
> My script is as follows:
>
> a = LOAD 'Report' AS (
> dt:chararray,
> Server:chararray,
> Type:chararray,
> Ops:chararray,
> UserID:chararray,
> U_cnt:int,
> U_tm:int,
> U_min_tm:int,
> U_max_tm:int,
> U_avg_tm:float,
> );
>
>
> --Remove Test Servers
> remtest = filter a by not Server matches 'Test%';
> -- Filter to required columns
> reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
> --Groupby
> G2 = group reqd by Server,Type,Ops;
> --Sum the User Counts and Times
> G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
> --byServeroperation = order G3 by Server;
> store G3 into 'Servertest';
>
> ingvay7
Re: Help with Script
Posted by Vishwanath <in...@yahoo.com>.
This is the error I got:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.
Updated code:
a = LOAD 'Report' AS (
dt:chararray,
Server:chararray,
Type:chararray,
Ops:chararray,
UserID:chararray,
U_cnt:int,
U_tm:int,
U_min_tm:int,
U_max_tm:int,
U_avg_tm:float
);
--Remove Test Servers
remtest = filter a by not Server matches 'Test%';
-- Filter to required columns
reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
--Groupby
G2 = group reqd by (Server,Type,Ops);
--Sum the User Counts and Times
G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
store G3 into 'Servertest';
----- Original Message -----
From: Prashant Kommireddi <pr...@gmail.com>
To: "user@pig.apache.org" <us...@pig.apache.org>
Cc:
Sent: Tuesday, November 13, 2012 11:59 AM
Subject: Re: Help with Script
Hi,
Can you paste the error message here?
Sent from my iPhone
On Nov 13, 2012, at 8:34 AM, "ingvay7@yahoo.com" <in...@yahoo.com> wrote:
> hey all,
>
> Very new Pig user here. I think I'm trying to get something very simple done but getting a few errors. See me script below.Any guidance will be appreciated.Thanks.
>
> I get errors such as Error during parsing. Invalid alias: serverin {time: double,count: double}
> I am basically trying to duplicate the following SQL query:
>
> select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
> from TableA
> group by 1, 2, 3;
>
>
>
> My script is as follows:
>
> a = LOAD 'Report' AS (
> dt:chararray,
> Server:chararray,
> Type:chararray,
> Ops:chararray,
> UserID:chararray,
> U_cnt:int,
> U_tm:int,
> U_min_tm:int,
> U_max_tm:int,
> U_avg_tm:float,
> );
>
>
> --Remove Test Servers
> remtest = filter a by not Server matches 'Test%';
> -- Filter to required columns
> reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
> --Groupby
> G2 = group reqd by Server,Type,Ops;
> --Sum the User Counts and Times
> G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
> --byServeroperation = order G3 by Server;
> store G3 into 'Servertest';
>
> ingvay7
Re: Help with Script
Posted by Prashant Kommireddi <pr...@gmail.com>.
Hi,
Can you paste the error message here?
Sent from my iPhone
On Nov 13, 2012, at 8:34 AM, "ingvay7@yahoo.com" <in...@yahoo.com> wrote:
> hey all,
>
> Very new Pig user here. I think I'm trying to get something very simple done but getting a few errors. See me script below.Any guidance will be appreciated.Thanks.
>
> I get errors such as Error during parsing. Invalid alias: serverin {time: double,count: double}
> I am basically trying to duplicate the following SQL query:
>
> select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
> from TableA
> group by 1, 2, 3;
>
>
>
> My script is as follows:
>
> a = LOAD 'Report' AS (
> dt:chararray,
> Server:chararray,
> Type:chararray,
> Ops:chararray,
> UserID:chararray,
> U_cnt:int,
> U_tm:int,
> U_min_tm:int,
> U_max_tm:int,
> U_avg_tm:float,
> );
>
>
> --Remove Test Servers
> remtest = filter a by not Server matches 'Test%';
> -- Filter to required columns
> reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
> --Groupby
> G2 = group reqd by Server,Type,Ops;
> --Sum the User Counts and Times
> G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
> --byServeroperation = order G3 by Server;
> store G3 into 'Servertest';
>
> ingvay7
Re: Help with Script
Posted by pablomar <pa...@gmail.com>.
just taking a quick look, I see a couple of errors:
1_ your LOAD hast one more comma. You need to delete the last one, after
U_avg_tm:float
2_ and then, the group by, I think you need parenthesis
G2 = group reqd by (Server,Type,Ops);
by the way, where is you alias serverin ?
On Tue, Nov 13, 2012 at 10:36 AM, ingvay7@yahoo.com <in...@yahoo.com>wrote:
> hey all,
>
> Very new Pig user here. I think I'm trying to get something very simple
> done but getting a few errors. See me script below.Any guidance will be
> appreciated.Thanks.
>
> I get errors such as Error during parsing. Invalid alias: serverin {time:
> double,count: double}
> I am basically trying to duplicate the following SQL query:
>
> select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
> from TableA
> group by 1, 2, 3;
>
>
>
> My script is as follows:
>
> a = LOAD 'Report' AS (
> dt:chararray,
> Server:chararray,
> Type:chararray,
> Ops:chararray,
> UserID:chararray,
> U_cnt:int,
> U_tm:int,
> U_min_tm:int,
> U_max_tm:int,
> U_avg_tm:float,
> );
>
>
> --Remove Test Servers
> remtest = filter a by not Server matches 'Test%';
> -- Filter to required columns
> reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
> --Groupby
> G2 = group reqd by Server,Type,Ops;
> --Sum the User Counts and Times
> G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
> --byServeroperation = order G3 by Server;
> store G3 into 'Servertest';
>
> ingvay7