You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Michael West <qu...@gmail.com> on 2013/03/02 17:11:27 UTC
how to stop dereferencing after join - error setting schema
I would like to set the schema after joining so that I do not have to always dereference. However, I receive an error when I try this. How can I resolve this error?
pig version 0.11
Error message:
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable field schema: declared is "tuple_0:tuple(name:chararray,age:int,gpa:float)", infered is "A::gpa:float"
Trivial sample code to demonstrate issue:
/*
studenttab10k file from
http://people.apache.org/~hashutosh/
*/
A = LOAD '/Users/mike/Documents/code/hadoop/pig/data/studenttab10k' AS (name:chararray, age:int, gpa:float);
B = LOAD '/Users/mike/Documents/code/hadoop/pig/data/studenttab10k' AS (name:chararray, age:int, gpa:float);
C = JOIN A by name, B by name;
ILLUSTRATE C;
D = FOREACH C GENERATE A::name, A::age, A::gpa AS (name:chararray, age:int, gpa:float);
DESCRIBE D;
ILLUSTRATE C output:
---------------------------------------------------------
| A | name:chararray | age:int | gpa:float |
---------------------------------------------------------
| | xavier steinbeck | 58 | 2.99 |
| | xavier steinbeck | 23 | 0.59 |
---------------------------------------------------------
---------------------------------------------------------
| B | name:chararray | age:int | gpa:float |
---------------------------------------------------------
| | xavier steinbeck | 58 | 2.99 |
| | xavier steinbeck | 23 | 0.59 |
---------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------
| C | A::name:chararray | A::age:int | A::gpa:float | B::name:chararray | B::age:int | B::gpa:float |
---------------------------------------------------------------------------------------------------------------------------
| | xavier steinbeck | 58 | 2.99 | xavier steinbeck | 58 | 2.99 |
| | xavier steinbeck | 58 | 2.99 | xavier steinbeck | 23 | 0.59 |
| | xavier steinbeck | 23 | 0.59 | xavier steinbeck | 58 | 2.99 |
| | xavier steinbeck | 23 | 0.59 | xavier steinbeck | 23 | 0.59 |
---------------------------------------------------------------------------------------------------------------------------
Re: how to stop dereferencing after join - error setting schema
Posted by Michael West <qu...@gmail.com>.
That works.
D = FOREACH C GENERATE A::name AS name , A::age AS age, A::gpa AS gpa;
Thanks!
On Mar 2, 2013, at 8:50 AM, Bill Graham <bi...@gmail.com> wrote:
> Each field needs to be dereferenced individually:
>
> A::name AS name, A::age AS age...
>
> On Saturday, March 2, 2013, Michael West wrote:
>
>>
>> I would like to set the schema after joining so that I do not have to
>> always dereference. However, I receive an error when I try this. How can
>> I resolve this error?
>>
>> pig version 0.11
>>
>> Error message:
>>
>> [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable
>> field schema: declared is
>> "tuple_0:tuple(name:chararray,age:int,gpa:float)", infered is "A::gpa:float"
>>
>>
>> Trivial sample code to demonstrate issue:
>>
>> /*
>> studenttab10k file from
>> http://people.apache.org/~hashutosh/
>> */
>>
>> A = LOAD '/Users/mike/Documents/code/hadoop/pig/data/studenttab10k' AS
>> (name:chararray, age:int, gpa:float);
>> B = LOAD '/Users/mike/Documents/code/hadoop/pig/data/studenttab10k' AS
>> (name:chararray, age:int, gpa:float);
>> C = JOIN A by name, B by name;
>> ILLUSTRATE C;
>>
>> D = FOREACH C GENERATE A::name, A::age, A::gpa AS (name:chararray,
>> age:int, gpa:float);
>> DESCRIBE D;
>>
>>
>> ILLUSTRATE C output:
>>
>> ---------------------------------------------------------
>> | A | name:chararray | age:int | gpa:float |
>> ---------------------------------------------------------
>> | | xavier steinbeck | 58 | 2.99 |
>> | | xavier steinbeck | 23 | 0.59 |
>> ---------------------------------------------------------
>> ---------------------------------------------------------
>> | B | name:chararray | age:int | gpa:float |
>> ---------------------------------------------------------
>> | | xavier steinbeck | 58 | 2.99 |
>> | | xavier steinbeck | 23 | 0.59 |
>> ---------------------------------------------------------
>>
>> ---------------------------------------------------------------------------------------------------------------------------
>> | C | A::name:chararray | A::age:int | A::gpa:float |
>> B::name:chararray | B::age:int | B::gpa:float |
>>
>> ---------------------------------------------------------------------------------------------------------------------------
>> | | xavier steinbeck | 58 | 2.99 | xavier
>> steinbeck | 58 | 2.99 |
>> | | xavier steinbeck | 58 | 2.99 | xavier
>> steinbeck | 23 | 0.59 |
>> | | xavier steinbeck | 23 | 0.59 | xavier
>> steinbeck | 58 | 2.99 |
>> | | xavier steinbeck | 23 | 0.59 | xavier
>> steinbeck | 23 | 0.59 |
>>
>> ---------------------------------------------------------------------------------------------------------------------------
>
>
>
> --
> Sent from Gmail Mobile
Re: how to stop dereferencing after join - error setting schema
Posted by Bill Graham <bi...@gmail.com>.
Each field needs to be dereferenced individually:
A::name AS name, A::age AS age...
On Saturday, March 2, 2013, Michael West wrote:
>
> I would like to set the schema after joining so that I do not have to
> always dereference. However, I receive an error when I try this. How can
> I resolve this error?
>
> pig version 0.11
>
> Error message:
>
> [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable
> field schema: declared is
> "tuple_0:tuple(name:chararray,age:int,gpa:float)", infered is "A::gpa:float"
>
>
> Trivial sample code to demonstrate issue:
>
> /*
> studenttab10k file from
> http://people.apache.org/~hashutosh/
> */
>
> A = LOAD '/Users/mike/Documents/code/hadoop/pig/data/studenttab10k' AS
> (name:chararray, age:int, gpa:float);
> B = LOAD '/Users/mike/Documents/code/hadoop/pig/data/studenttab10k' AS
> (name:chararray, age:int, gpa:float);
> C = JOIN A by name, B by name;
> ILLUSTRATE C;
>
> D = FOREACH C GENERATE A::name, A::age, A::gpa AS (name:chararray,
> age:int, gpa:float);
> DESCRIBE D;
>
>
> ILLUSTRATE C output:
>
> ---------------------------------------------------------
> | A | name:chararray | age:int | gpa:float |
> ---------------------------------------------------------
> | | xavier steinbeck | 58 | 2.99 |
> | | xavier steinbeck | 23 | 0.59 |
> ---------------------------------------------------------
> ---------------------------------------------------------
> | B | name:chararray | age:int | gpa:float |
> ---------------------------------------------------------
> | | xavier steinbeck | 58 | 2.99 |
> | | xavier steinbeck | 23 | 0.59 |
> ---------------------------------------------------------
>
> ---------------------------------------------------------------------------------------------------------------------------
> | C | A::name:chararray | A::age:int | A::gpa:float |
> B::name:chararray | B::age:int | B::gpa:float |
>
> ---------------------------------------------------------------------------------------------------------------------------
> | | xavier steinbeck | 58 | 2.99 | xavier
> steinbeck | 58 | 2.99 |
> | | xavier steinbeck | 58 | 2.99 | xavier
> steinbeck | 23 | 0.59 |
> | | xavier steinbeck | 23 | 0.59 | xavier
> steinbeck | 58 | 2.99 |
> | | xavier steinbeck | 23 | 0.59 | xavier
> steinbeck | 23 | 0.59 |
>
> ---------------------------------------------------------------------------------------------------------------------------
--
Sent from Gmail Mobile