You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Syed Wasti <md...@hotmail.com> on 2010/05/07 06:33:03 UTC

GROUP Error

Hi,
Trying to make my script execute happily, it wont stop throwing errors.
It works if I don¹t group my data. But once I group,
It starts with;
ERROR 1000: Error during parsing. Invalid alias: id  in {group:
long,edu_affl: {id
I got rid of it,  then it complains about Invalid alias: org_type,
It keeps going and at the end it complains about mismatch in comparison, as
the left side is a bag and right side is chararray. How ?

Not sure, where I am going wrong, need help;


research =      LOAD 'count'
                USING PigStorage('\t')
                as (id: long, org_type: chararray, dept_type: int,
                cnt: int, cnt_distinct: int);

final_grp =     GROUP research BY id;

Final_table =   FOREACH final_grp
                GENERATE id,
                (org_type == 'S' AND dept_type == 1?cnt:0) AS sale,
                (org_type == 'S' AND dept_type == 2?cnt:0) AS saleEx,
                (org_type == 'C' AND dept_type == 2?cnt:0) AS acc_grp,
                (org_type == 'C' AND dept_type == 4?cnt:0) AS
acc_grp_manual,
                (org_type == 'S' AND dept_type == 1?cnt_distinct:0) AS
sale_distinct,
                (org_type == 'S' AND dept_type == 2?cnt_distinct:0) AS
saleEx_distinct,
                (org_type == 'C' AND dept_type == 2?cnt_distinct:0) AS
acc_grp_distinct,
                (org_type == 'C' AND dept_type == 4?cnt_distinct:0) AS
accM_grp_distinct;

STORE final_table INTO 'final_count' USING PigStorage();

Regards
Syed

Re: GROUP Error

Posted by Syed Wasti <md...@hotmail.com>.
Hahaha, Thanks for your help, I am not at all confused and when I am, I will
come back to this rocking "pig-user" group ;)


On 5/7/10 9:46 AM, "Dmitriy Ryaboy" <dv...@gmail.com> wrote:

> "clear as mud" generally means totally unclear (mud being, well,
> muddy). So now I am confused as to whether I confused you or not :)
> 
> Regarding the relations -- only relations that are stored are actually
> materialized; the rest are computed and discarded on the fly. If you
> use multi-store, you can write a long script with multiple stores, and
> the execution plan will be constructed in such a way that when
> possible, work is reused. The grunt shell doesn't give you this, as it
> is interactive, and executes dumps and stores as soon as you enter
> them -- so it can't delay work until it knows all the things it will
> need to compute.
> 
> -Dmitriy
> 
> On Fri, May 7, 2010 at 9:26 AM, Syed Wasti <md...@hotmail.com> wrote:
>> Clear as mud, nice explanation. Thanks.
>> 
>> Mean while I want to throw another basic question on how relations work,
>> After you DUMP or STORE the results, what happens to all the relations
>> before DUMP or STORE. In a grunt shell I can still use the relations and do
>> something else with them. Are all the relations saved for that session ?
>> OR is the data for relations saved under tmp ?
>> Need some clarity on this.
>> 
>> 
>> On 5/6/10 10:05 PM, "Dmitriy Ryaboy" <dv...@gmail.com> wrote:
>> 
>>> It's id => group, org_type => research.org_type and dept_type =>
>>> research.dept_type
>>> But that still doesn't get you where you want to go.. read on.
>>> 
>>> Here's a quick explanation.
>>> When you group a relation, the result is a new relation with two
>>> columns: "group" and the name of the original relation.
>>> The group column has the schema of what you grouped by. If you grouped
>>> by an integer column, for example, the type will be int.
>>> If you grouped by a tuple of several columns -- "foo = group bar by
>>> (a, b);" -- the "group" column will be a tuple with two fields, a and
>>> b.
>>> They can be retrieved by flattening "group", or by directly accessing
>>> them: "group.a, group.b".
>>> 
>>> The second column will be named after the original relation, and
>>> contain a *bag* of all the rows in the original relation that match
>>> the corresponding group. The rows are unaltered -- so they still
>>> contain columns you grouped by, for example.  The way bags work is,
>>> referring to mybag.some_field essentially means "for each tuple in the
>>> bag, give me some_field in that tuple".  So you can do things like
>>> 
>>> group_counts = foreach grouped_data generate
>>>   group as key,  -- the key you grouped on
>>>   COUNT(original_data), -- the number of rows that have that key
>>>   MAX(original_data.cost); -- the maximum cost of items that have that key
>>> 
>>> Note that all the functions in this example are aggregates. That's
>>> because they are things we can do to a *collection of values*. It
>>> doesn't make sense to do something like what you are trying -- check
>>> if some field is equal to S, for example.  That's because there may be
>>> multiple rows in your original data that have the same id, but
>>> different org_type. What can pig give you when it encounters a row
>>> like below (I represent it with field names for clarity):
>>> 
>>> ( group=1, { (  id=1, org_type='S', cnt=3 ), ( id=1, org_type='L", cnt=4)}
>>> 
>>> Should it interpret org_type as S, or L?
>>> 
>>> I suspect that what you actually want is something like this:
>>> 
>>> -- load
>>> research =      LOAD 'count'
>>>                 USING PigStorage('\t')
>>>                 as (id: long, org_type: chararray, dept_type: int,
>>>                 cnt: int, cnt_distinct: int);
>>> 
>>> -- break out the counts
>>> research = FOREACH research
>>>                 GENERATE id,
>>>                 (org_type == 'S' AND dept_type == 1?cnt:0) AS sale,
>>>                 (org_type == 'S' AND dept_type == 2?cnt:0) AS saleEx,
>>>                 (org_type == 'C' AND dept_type == 2?cnt:0) AS acc_grp,
>>>                 (org_type == 'C' AND dept_type == 4?cnt:0) AS
>>> acc_grp_manual,
>>>                 (org_type == 'S' AND dept_type == 1?cnt_distinct:0) AS
>>> sale_distinct,
>>>                 (org_type == 'S' AND dept_type == 2?cnt_distinct:0) AS
>>> saleEx_distinct,
>>>                 (org_type == 'C' AND dept_type == 2?cnt_distinct:0) AS
>>> acc_grp_distinct,
>>>                 (org_type == 'C' AND dept_type == 4?cnt_distinct:0) AS
>>> accM_grp_distinct;
>>> 
>>> -- group and generate
>>> grouped = group research by id;
>>> final_data = FOREACH grouped GENERATE group as id, SUM(research.sale)
>>> as total_sales, SUM(research.saleEx) as total_saleEx, .....;
>>> 
>>> Or perhaps you want the pivot of that:
>>> 
>>> research = LOAD....
>>> grouped = group research by (id, org_type, dept_type);
>>> final_data = FOREACH grouped GENERATE
>>>   FLATTEN(group) as (id, org_type, dept_type),
>>>  (group.org_type == 'S' AND group.dept_type == 1? SUM(research.cnt) :0) AS
>>> sale,
>>> .....
>>> 
>>> Hope this helps.
>>> 
>>> -Dmitriy
>>> 
>>> On Thu, May 6, 2010 at 9:33 PM, Syed Wasti <md...@hotmail.com> wrote:
>>>> Hi,
>>>> Trying to make my script execute happily, it wont stop throwing errors.
>>>> It works if I don¹t group my data. But once I group,
>>>> It starts with;
>>>> ERROR 1000: Error during parsing. Invalid alias: id  in {group:
>>>> long,edu_affl: {id
>>>> I got rid of it,  then it complains about Invalid alias: org_type,
>>>> It keeps going and at the end it complains about mismatch in comparison, as
>>>> the left side is a bag and right side is chararray. How ?
>>>> 
>>>> Not sure, where I am going wrong, need help;
>>>> 
>>>> 
>>>> research =      LOAD 'count'
>>>>                USING PigStorage('\t')
>>>>                as (id: long, org_type: chararray, dept_type: int,
>>>>                cnt: int, cnt_distinct: int);
>>>> 
>>>> final_grp =     GROUP research BY id;
>>>> 
>>>> Final_table =   FOREACH final_grp
>>>>                GENERATE id,
>>>>                (org_type == 'S' AND dept_type == 1?cnt:0) AS sale,
>>>>                (org_type == 'S' AND dept_type == 2?cnt:0) AS saleEx,
>>>>                (org_type == 'C' AND dept_type == 2?cnt:0) AS acc_grp,
>>>>                (org_type == 'C' AND dept_type == 4?cnt:0) AS
>>>> acc_grp_manual,
>>>>                (org_type == 'S' AND dept_type == 1?cnt_distinct:0) AS
>>>> sale_distinct,
>>>>                (org_type == 'S' AND dept_type == 2?cnt_distinct:0) AS
>>>> saleEx_distinct,
>>>>                (org_type == 'C' AND dept_type == 2?cnt_distinct:0) AS
>>>> acc_grp_distinct,
>>>>                (org_type == 'C' AND dept_type == 4?cnt_distinct:0) AS
>>>> accM_grp_distinct;
>>>> 
>>>> STORE final_table INTO 'final_count' USING PigStorage();
>>>> 
>>>> Regards
>>>> Syed
>>>> 
>>> 
>> 
>> 
>> 
> 



Re: GROUP Error

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
"clear as mud" generally means totally unclear (mud being, well,
muddy). So now I am confused as to whether I confused you or not :)

Regarding the relations -- only relations that are stored are actually
materialized; the rest are computed and discarded on the fly. If you
use multi-store, you can write a long script with multiple stores, and
the execution plan will be constructed in such a way that when
possible, work is reused. The grunt shell doesn't give you this, as it
is interactive, and executes dumps and stores as soon as you enter
them -- so it can't delay work until it knows all the things it will
need to compute.

-Dmitriy

On Fri, May 7, 2010 at 9:26 AM, Syed Wasti <md...@hotmail.com> wrote:
> Clear as mud, nice explanation. Thanks.
>
> Mean while I want to throw another basic question on how relations work,
> After you DUMP or STORE the results, what happens to all the relations
> before DUMP or STORE. In a grunt shell I can still use the relations and do
> something else with them. Are all the relations saved for that session ?
> OR is the data for relations saved under tmp ?
> Need some clarity on this.
>
>
> On 5/6/10 10:05 PM, "Dmitriy Ryaboy" <dv...@gmail.com> wrote:
>
>> It's id => group, org_type => research.org_type and dept_type =>
>> research.dept_type
>> But that still doesn't get you where you want to go.. read on.
>>
>> Here's a quick explanation.
>> When you group a relation, the result is a new relation with two
>> columns: "group" and the name of the original relation.
>> The group column has the schema of what you grouped by. If you grouped
>> by an integer column, for example, the type will be int.
>> If you grouped by a tuple of several columns -- "foo = group bar by
>> (a, b);" -- the "group" column will be a tuple with two fields, a and
>> b.
>> They can be retrieved by flattening "group", or by directly accessing
>> them: "group.a, group.b".
>>
>> The second column will be named after the original relation, and
>> contain a *bag* of all the rows in the original relation that match
>> the corresponding group. The rows are unaltered -- so they still
>> contain columns you grouped by, for example.  The way bags work is,
>> referring to mybag.some_field essentially means "for each tuple in the
>> bag, give me some_field in that tuple".  So you can do things like
>>
>> group_counts = foreach grouped_data generate
>>   group as key,  -- the key you grouped on
>>   COUNT(original_data), -- the number of rows that have that key
>>   MAX(original_data.cost); -- the maximum cost of items that have that key
>>
>> Note that all the functions in this example are aggregates. That's
>> because they are things we can do to a *collection of values*. It
>> doesn't make sense to do something like what you are trying -- check
>> if some field is equal to S, for example.  That's because there may be
>> multiple rows in your original data that have the same id, but
>> different org_type. What can pig give you when it encounters a row
>> like below (I represent it with field names for clarity):
>>
>> ( group=1, { (  id=1, org_type='S', cnt=3 ), ( id=1, org_type='L", cnt=4)}
>>
>> Should it interpret org_type as S, or L?
>>
>> I suspect that what you actually want is something like this:
>>
>> -- load
>> research =      LOAD 'count'
>>                 USING PigStorage('\t')
>>                 as (id: long, org_type: chararray, dept_type: int,
>>                 cnt: int, cnt_distinct: int);
>>
>> -- break out the counts
>> research = FOREACH research
>>                 GENERATE id,
>>                 (org_type == 'S' AND dept_type == 1?cnt:0) AS sale,
>>                 (org_type == 'S' AND dept_type == 2?cnt:0) AS saleEx,
>>                 (org_type == 'C' AND dept_type == 2?cnt:0) AS acc_grp,
>>                 (org_type == 'C' AND dept_type == 4?cnt:0) AS acc_grp_manual,
>>                 (org_type == 'S' AND dept_type == 1?cnt_distinct:0) AS
>> sale_distinct,
>>                 (org_type == 'S' AND dept_type == 2?cnt_distinct:0) AS
>> saleEx_distinct,
>>                 (org_type == 'C' AND dept_type == 2?cnt_distinct:0) AS
>> acc_grp_distinct,
>>                 (org_type == 'C' AND dept_type == 4?cnt_distinct:0) AS
>> accM_grp_distinct;
>>
>> -- group and generate
>> grouped = group research by id;
>> final_data = FOREACH grouped GENERATE group as id, SUM(research.sale)
>> as total_sales, SUM(research.saleEx) as total_saleEx, .....;
>>
>> Or perhaps you want the pivot of that:
>>
>> research = LOAD....
>> grouped = group research by (id, org_type, dept_type);
>> final_data = FOREACH grouped GENERATE
>>   FLATTEN(group) as (id, org_type, dept_type),
>>  (group.org_type == 'S' AND group.dept_type == 1? SUM(research.cnt) :0) AS
>> sale,
>> .....
>>
>> Hope this helps.
>>
>> -Dmitriy
>>
>> On Thu, May 6, 2010 at 9:33 PM, Syed Wasti <md...@hotmail.com> wrote:
>>> Hi,
>>> Trying to make my script execute happily, it wont stop throwing errors.
>>> It works if I don¹t group my data. But once I group,
>>> It starts with;
>>> ERROR 1000: Error during parsing. Invalid alias: id  in {group:
>>> long,edu_affl: {id
>>> I got rid of it,  then it complains about Invalid alias: org_type,
>>> It keeps going and at the end it complains about mismatch in comparison, as
>>> the left side is a bag and right side is chararray. How ?
>>>
>>> Not sure, where I am going wrong, need help;
>>>
>>>
>>> research =      LOAD 'count'
>>>                USING PigStorage('\t')
>>>                as (id: long, org_type: chararray, dept_type: int,
>>>                cnt: int, cnt_distinct: int);
>>>
>>> final_grp =     GROUP research BY id;
>>>
>>> Final_table =   FOREACH final_grp
>>>                GENERATE id,
>>>                (org_type == 'S' AND dept_type == 1?cnt:0) AS sale,
>>>                (org_type == 'S' AND dept_type == 2?cnt:0) AS saleEx,
>>>                (org_type == 'C' AND dept_type == 2?cnt:0) AS acc_grp,
>>>                (org_type == 'C' AND dept_type == 4?cnt:0) AS
>>> acc_grp_manual,
>>>                (org_type == 'S' AND dept_type == 1?cnt_distinct:0) AS
>>> sale_distinct,
>>>                (org_type == 'S' AND dept_type == 2?cnt_distinct:0) AS
>>> saleEx_distinct,
>>>                (org_type == 'C' AND dept_type == 2?cnt_distinct:0) AS
>>> acc_grp_distinct,
>>>                (org_type == 'C' AND dept_type == 4?cnt_distinct:0) AS
>>> accM_grp_distinct;
>>>
>>> STORE final_table INTO 'final_count' USING PigStorage();
>>>
>>> Regards
>>> Syed
>>>
>>
>
>
>

Re: GROUP Error

Posted by Syed Wasti <md...@hotmail.com>.
Clear as mud, nice explanation. Thanks.

Mean while I want to throw another basic question on how relations work,
After you DUMP or STORE the results, what happens to all the relations
before DUMP or STORE. In a grunt shell I can still use the relations and do
something else with them. Are all the relations saved for that session ?
OR is the data for relations saved under tmp ?
Need some clarity on this.


On 5/6/10 10:05 PM, "Dmitriy Ryaboy" <dv...@gmail.com> wrote:

> It's id => group, org_type => research.org_type and dept_type =>
> research.dept_type
> But that still doesn't get you where you want to go.. read on.
> 
> Here's a quick explanation.
> When you group a relation, the result is a new relation with two
> columns: "group" and the name of the original relation.
> The group column has the schema of what you grouped by. If you grouped
> by an integer column, for example, the type will be int.
> If you grouped by a tuple of several columns -- "foo = group bar by
> (a, b);" -- the "group" column will be a tuple with two fields, a and
> b.
> They can be retrieved by flattening "group", or by directly accessing
> them: "group.a, group.b".
> 
> The second column will be named after the original relation, and
> contain a *bag* of all the rows in the original relation that match
> the corresponding group. The rows are unaltered -- so they still
> contain columns you grouped by, for example.  The way bags work is,
> referring to mybag.some_field essentially means "for each tuple in the
> bag, give me some_field in that tuple".  So you can do things like
> 
> group_counts = foreach grouped_data generate
>   group as key,  -- the key you grouped on
>   COUNT(original_data), -- the number of rows that have that key
>   MAX(original_data.cost); -- the maximum cost of items that have that key
> 
> Note that all the functions in this example are aggregates. That's
> because they are things we can do to a *collection of values*. It
> doesn't make sense to do something like what you are trying -- check
> if some field is equal to S, for example.  That's because there may be
> multiple rows in your original data that have the same id, but
> different org_type. What can pig give you when it encounters a row
> like below (I represent it with field names for clarity):
> 
> ( group=1, { (  id=1, org_type='S', cnt=3 ), ( id=1, org_type='L", cnt=4)}
> 
> Should it interpret org_type as S, or L?
> 
> I suspect that what you actually want is something like this:
> 
> -- load
> research =      LOAD 'count'
>                 USING PigStorage('\t')
>                 as (id: long, org_type: chararray, dept_type: int,
>                 cnt: int, cnt_distinct: int);
> 
> -- break out the counts
> research = FOREACH research
>                 GENERATE id,
>                 (org_type == 'S' AND dept_type == 1?cnt:0) AS sale,
>                 (org_type == 'S' AND dept_type == 2?cnt:0) AS saleEx,
>                 (org_type == 'C' AND dept_type == 2?cnt:0) AS acc_grp,
>                 (org_type == 'C' AND dept_type == 4?cnt:0) AS acc_grp_manual,
>                 (org_type == 'S' AND dept_type == 1?cnt_distinct:0) AS
> sale_distinct,
>                 (org_type == 'S' AND dept_type == 2?cnt_distinct:0) AS
> saleEx_distinct,
>                 (org_type == 'C' AND dept_type == 2?cnt_distinct:0) AS
> acc_grp_distinct,
>                 (org_type == 'C' AND dept_type == 4?cnt_distinct:0) AS
> accM_grp_distinct;
> 
> -- group and generate
> grouped = group research by id;
> final_data = FOREACH grouped GENERATE group as id, SUM(research.sale)
> as total_sales, SUM(research.saleEx) as total_saleEx, .....;
> 
> Or perhaps you want the pivot of that:
> 
> research = LOAD....
> grouped = group research by (id, org_type, dept_type);
> final_data = FOREACH grouped GENERATE
>   FLATTEN(group) as (id, org_type, dept_type),
>  (group.org_type == 'S' AND group.dept_type == 1? SUM(research.cnt) :0) AS
> sale,
> .....
> 
> Hope this helps.
> 
> -Dmitriy
> 
> On Thu, May 6, 2010 at 9:33 PM, Syed Wasti <md...@hotmail.com> wrote:
>> Hi,
>> Trying to make my script execute happily, it wont stop throwing errors.
>> It works if I don¹t group my data. But once I group,
>> It starts with;
>> ERROR 1000: Error during parsing. Invalid alias: id  in {group:
>> long,edu_affl: {id
>> I got rid of it,  then it complains about Invalid alias: org_type,
>> It keeps going and at the end it complains about mismatch in comparison, as
>> the left side is a bag and right side is chararray. How ?
>> 
>> Not sure, where I am going wrong, need help;
>> 
>> 
>> research =      LOAD 'count'
>>                USING PigStorage('\t')
>>                as (id: long, org_type: chararray, dept_type: int,
>>                cnt: int, cnt_distinct: int);
>> 
>> final_grp =     GROUP research BY id;
>> 
>> Final_table =   FOREACH final_grp
>>                GENERATE id,
>>                (org_type == 'S' AND dept_type == 1?cnt:0) AS sale,
>>                (org_type == 'S' AND dept_type == 2?cnt:0) AS saleEx,
>>                (org_type == 'C' AND dept_type == 2?cnt:0) AS acc_grp,
>>                (org_type == 'C' AND dept_type == 4?cnt:0) AS
>> acc_grp_manual,
>>                (org_type == 'S' AND dept_type == 1?cnt_distinct:0) AS
>> sale_distinct,
>>                (org_type == 'S' AND dept_type == 2?cnt_distinct:0) AS
>> saleEx_distinct,
>>                (org_type == 'C' AND dept_type == 2?cnt_distinct:0) AS
>> acc_grp_distinct,
>>                (org_type == 'C' AND dept_type == 4?cnt_distinct:0) AS
>> accM_grp_distinct;
>> 
>> STORE final_table INTO 'final_count' USING PigStorage();
>> 
>> Regards
>> Syed
>> 
> 



Re: GROUP Error

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
It's id => group, org_type => research.org_type and dept_type =>
research.dept_type
But that still doesn't get you where you want to go.. read on.

Here's a quick explanation.
When you group a relation, the result is a new relation with two
columns: "group" and the name of the original relation.
The group column has the schema of what you grouped by. If you grouped
by an integer column, for example, the type will be int.
If you grouped by a tuple of several columns -- "foo = group bar by
(a, b);" -- the "group" column will be a tuple with two fields, a and
b.
They can be retrieved by flattening "group", or by directly accessing
them: "group.a, group.b".

The second column will be named after the original relation, and
contain a *bag* of all the rows in the original relation that match
the corresponding group. The rows are unaltered -- so they still
contain columns you grouped by, for example.  The way bags work is,
referring to mybag.some_field essentially means "for each tuple in the
bag, give me some_field in that tuple".  So you can do things like

group_counts = foreach grouped_data generate
  group as key,  -- the key you grouped on
  COUNT(original_data), -- the number of rows that have that key
  MAX(original_data.cost); -- the maximum cost of items that have that key

Note that all the functions in this example are aggregates. That's
because they are things we can do to a *collection of values*. It
doesn't make sense to do something like what you are trying -- check
if some field is equal to S, for example.  That's because there may be
multiple rows in your original data that have the same id, but
different org_type. What can pig give you when it encounters a row
like below (I represent it with field names for clarity):

( group=1, { (  id=1, org_type='S', cnt=3 ), ( id=1, org_type='L", cnt=4)}

Should it interpret org_type as S, or L?

I suspect that what you actually want is something like this:

-- load
research =      LOAD 'count'
                USING PigStorage('\t')
                as (id: long, org_type: chararray, dept_type: int,
                cnt: int, cnt_distinct: int);

-- break out the counts
research = FOREACH research
                GENERATE id,
                (org_type == 'S' AND dept_type == 1?cnt:0) AS sale,
                (org_type == 'S' AND dept_type == 2?cnt:0) AS saleEx,
                (org_type == 'C' AND dept_type == 2?cnt:0) AS acc_grp,
                (org_type == 'C' AND dept_type == 4?cnt:0) AS acc_grp_manual,
                (org_type == 'S' AND dept_type == 1?cnt_distinct:0) AS
sale_distinct,
                (org_type == 'S' AND dept_type == 2?cnt_distinct:0) AS
saleEx_distinct,
                (org_type == 'C' AND dept_type == 2?cnt_distinct:0) AS
acc_grp_distinct,
                (org_type == 'C' AND dept_type == 4?cnt_distinct:0) AS
accM_grp_distinct;

-- group and generate
grouped = group research by id;
final_data = FOREACH grouped GENERATE group as id, SUM(research.sale)
as total_sales, SUM(research.saleEx) as total_saleEx, .....;

Or perhaps you want the pivot of that:

research = LOAD....
grouped = group research by (id, org_type, dept_type);
final_data = FOREACH grouped GENERATE
  FLATTEN(group) as (id, org_type, dept_type),
 (group.org_type == 'S' AND group.dept_type == 1? SUM(research.cnt) :0) AS sale,
.....

Hope this helps.

-Dmitriy

On Thu, May 6, 2010 at 9:33 PM, Syed Wasti <md...@hotmail.com> wrote:
> Hi,
> Trying to make my script execute happily, it wont stop throwing errors.
> It works if I don¹t group my data. But once I group,
> It starts with;
> ERROR 1000: Error during parsing. Invalid alias: id  in {group:
> long,edu_affl: {id
> I got rid of it,  then it complains about Invalid alias: org_type,
> It keeps going and at the end it complains about mismatch in comparison, as
> the left side is a bag and right side is chararray. How ?
>
> Not sure, where I am going wrong, need help;
>
>
> research =      LOAD 'count'
>                USING PigStorage('\t')
>                as (id: long, org_type: chararray, dept_type: int,
>                cnt: int, cnt_distinct: int);
>
> final_grp =     GROUP research BY id;
>
> Final_table =   FOREACH final_grp
>                GENERATE id,
>                (org_type == 'S' AND dept_type == 1?cnt:0) AS sale,
>                (org_type == 'S' AND dept_type == 2?cnt:0) AS saleEx,
>                (org_type == 'C' AND dept_type == 2?cnt:0) AS acc_grp,
>                (org_type == 'C' AND dept_type == 4?cnt:0) AS
> acc_grp_manual,
>                (org_type == 'S' AND dept_type == 1?cnt_distinct:0) AS
> sale_distinct,
>                (org_type == 'S' AND dept_type == 2?cnt_distinct:0) AS
> saleEx_distinct,
>                (org_type == 'C' AND dept_type == 2?cnt_distinct:0) AS
> acc_grp_distinct,
>                (org_type == 'C' AND dept_type == 4?cnt_distinct:0) AS
> accM_grp_distinct;
>
> STORE final_table INTO 'final_count' USING PigStorage();
>
> Regards
> Syed
>