You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2010/02/02 19:20:18 UTC

[jira] Updated: (PIG-723) Pig generates incorrect schema for generated bags after FOREACH.

     [ https://issues.apache.org/jira/browse/PIG-723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-723:
-------------------------------

    Description: 
grunt> rf_src = LOAD 'rf_test.txt' USING PigStorage(',') AS (lhs:chararray, rhs:chararray, r:float, p:float, c:float);                            
grunt> rf_grouped = GROUP rf_src BY rhs;                                                                                                      
grunt> lhs_grouped = FOREACH rf_grouped GENERATE group as rhs, rf_src.(lhs, r) as lhs, MAX(rf_src.p) as p, MAX(rf_src.c) AS c;                
grunt> describe lhs_grouped;
lhs_grouped: {rhs: chararray,lhs: {lhs: chararray,r: float},p: float,c: float}

I think it should be:
lhs_grouped: {rhs: chararray,lhs: {(lhs: chararray,r: float)},p: float,c: float}

Because of this, we are not able to perform UNION on 2 sets because union on incompatible schemas is causing a complete loss of schema information, making further processing impossible.

This is what we want to UNION with:

grunt> asrc = LOAD 'atest.txt' USING PigStorage(',') AS (rhs:chararray, a:int);                    
grunt> aa = FOREACH asrc GENERATE rhs, (bag{tuple(chararray,float)}) null as lhs, -10F as p, -10F as c;
grunt> describe aa;
aa: {rhs: chararray,lhs: {(chararray,float)},p: float,c: float}

If there is something wrong with what I am trying to do, please let me know.


  was:

grunt> rf_src = LOAD 'rf_test.txt' USING PigStorage(',') AS (lhs:chararray, rhs:chararray, r:float, p:float, c:float);                            
grunt> rf_grouped = GROUP rf_src BY rhs;                                                                                                      
grunt> lhs_grouped = FOREACH rf_grouped GENERATE group as rhs, rf_src.(lhs, r) as lhs, MAX(rf_src.p) as p, MAX(rf_src.c) AS c;                
grunt> describe lhs_grouped;
lhs_grouped: {rhs: chararray,lhs: {lhs: chararray,r: float},p: float,c: float}

I think it should be:
lhs_grouped: {rhs: chararray,lhs: {(lhs: chararray,r: float)},p: float,c: float}

Because of this, we are not able to perform UNION on 2 sets because union on incompatible schemas is causing a complete loss of schema information, making further processing impossible.

This is what we want to UNION with:

grunt> asrc = LOAD 'atest.txt' USING PigStorage(',') AS (rhs:chararray, a:int);                    
grunt> aa = FOREACH asrc GENERATE rhs, (bag{tuple(chararray,float)}) null as lhs, -10F as p, -10F as c;
grunt> describe aa;
aa: {rhs: chararray,lhs: {(chararray,float)},p: float,c: float}

If there is something wrong with what I am trying to do, please let me know.


       Priority: Major  (was: Critical)

Not sure why this issue was marked as critical

> Pig generates incorrect schema for generated bags after FOREACH.
> ----------------------------------------------------------------
>
>                 Key: PIG-723
>                 URL: https://issues.apache.org/jira/browse/PIG-723
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.1.0
>         Environment: Linux
> $pig --version
> Apache Pig version 0.1.0-dev (r750430)
> compiled Mar 07 2009, 09:20:13
>            Reporter: Dhruv M
>
> grunt> rf_src = LOAD 'rf_test.txt' USING PigStorage(',') AS (lhs:chararray, rhs:chararray, r:float, p:float, c:float);                            
> grunt> rf_grouped = GROUP rf_src BY rhs;                                                                                                      
> grunt> lhs_grouped = FOREACH rf_grouped GENERATE group as rhs, rf_src.(lhs, r) as lhs, MAX(rf_src.p) as p, MAX(rf_src.c) AS c;                
> grunt> describe lhs_grouped;
> lhs_grouped: {rhs: chararray,lhs: {lhs: chararray,r: float},p: float,c: float}
> I think it should be:
> lhs_grouped: {rhs: chararray,lhs: {(lhs: chararray,r: float)},p: float,c: float}
> Because of this, we are not able to perform UNION on 2 sets because union on incompatible schemas is causing a complete loss of schema information, making further processing impossible.
> This is what we want to UNION with:
> grunt> asrc = LOAD 'atest.txt' USING PigStorage(',') AS (rhs:chararray, a:int);                    
> grunt> aa = FOREACH asrc GENERATE rhs, (bag{tuple(chararray,float)}) null as lhs, -10F as p, -10F as c;
> grunt> describe aa;
> aa: {rhs: chararray,lhs: {(chararray,float)},p: float,c: float}
> If there is something wrong with what I am trying to do, please let me know.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.