You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2010/08/04 22:42:16 UTC

[jira] Created: (PIG-1536) use same logic for merging inner schemas in "default union" and "union onschema"

use same logic for merging inner schemas in "default union" and "union onschema"
--------------------------------------------------------------------------------

                 Key: PIG-1536
                 URL: https://issues.apache.org/jira/browse/PIG-1536
             Project: Pig
          Issue Type: Task
            Reporter: Thejas M Nair
             Fix For: 0.9.0


We should consider using logic for merging inner schema in case of the two different types of union. 

In case of 'default union', it merges the two inner schema of bags/tuples by position if the number of fields are same and the corresponding types are compatible. 

In case of 'union onschema', it considers tuple/bag with different innerschema to be incompatible types.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1536) use same logic for merging inner schemas in "default union" and "union onschema"

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895410#action_12895410 ] 

Thejas M Nair commented on PIG-1536:
------------------------------------


The way 'default union' deals with columns of different but compatible types in same position is not right. It creates a merged schema choosing a merged type, but there is not cast that happens to convert the rows to this type.
eg -

{code}
grunt> l1 = load '/tmp/f1' as (a : chararray, t (a : int, c : long) );
grunt> l2 = load '/tmp/f1' as (a : chararray, t (a : int, b : int) ); 
grunt> u = union l1, l2;                                              
grunt> describe u;                                                    
u: {a: chararray,t: (a: int,c: long)}

-- the result of u, only the rows originating from l1 will correspond to schema shown in describe.

MapReduce node 1-206
Map Plan
u: Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-203
|
|---u: Union[bag] - 1-202
    |
    |---l1: New For Each(false,false)[bag] - 1-195
    |   |   |
    |   |   Cast[chararray] - 1-192
    |   |   |
    |   |   |---Project[bytearray][0] - 1-191
    |   |   |
    |   |   Cast[tuple:(int,long)] - 1-194
    |   |   |
    |   |   |---Project[bytearray][1] - 1-193
    |   |
    |   |---l1: Load(/tmp/f1:org.apache.pig.builtin.PigStorage) - 1-190
    |
    |---l2: New For Each(false,false)[bag] - 1-201
        |   |
        |   Cast[chararray] - 1-198
        |   |
        |   |---Project[bytearray][0] - 1-197
        |   |
        |   Cast[tuple:(int,int)] - 1-200
        |   |
        |   |---Project[bytearray][1] - 1-199
        |
        |---l2: Load(/tmp/f1:org.apache.pig.builtin.PigStorage) - 1-196--------
Global sort: false
----------------

{code}

> use same logic for merging inner schemas in "default union" and "union onschema"
> --------------------------------------------------------------------------------
>
>                 Key: PIG-1536
>                 URL: https://issues.apache.org/jira/browse/PIG-1536
>             Project: Pig
>          Issue Type: Task
>            Reporter: Thejas M Nair
>             Fix For: 0.9.0
>
>
> We should consider using logic for merging inner schema in case of the two different types of union. 
> In case of 'default union', it merges the two inner schema of bags/tuples by position if the number of fields are same and the corresponding types are compatible. 
> In case of 'union onschema', it considers tuple/bag with different innerschema to be incompatible types.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-1536) use same logic for merging inner schemas in "default union" and "union onschema"

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates reassigned PIG-1536:
-------------------------------

    Assignee: Alan Gates

> use same logic for merging inner schemas in "default union" and "union onschema"
> --------------------------------------------------------------------------------
>
>                 Key: PIG-1536
>                 URL: https://issues.apache.org/jira/browse/PIG-1536
>             Project: Pig
>          Issue Type: Task
>            Reporter: Thejas M Nair
>            Assignee: Alan Gates
>             Fix For: 0.9.0
>
>
> We should consider using logic for merging inner schema in case of the two different types of union. 
> In case of 'default union', it merges the two inner schema of bags/tuples by position if the number of fields are same and the corresponding types are compatible. 
> In case of 'union onschema', it considers tuple/bag with different innerschema to be incompatible types.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.