You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Pradeep Kamath (JIRA)" <ji...@apache.org> on 2009/06/13 00:35:07 UTC

[jira] Created: (PIG-847) Setting twoLevelAccessRequired field in a bag schema should not be required to access fields in the tuples of the bag

Setting twoLevelAccessRequired field in a bag schema should not be required to access fields in the tuples of the bag
---------------------------------------------------------------------------------------------------------------------

                 Key: PIG-847
                 URL: https://issues.apache.org/jira/browse/PIG-847
             Project: Pig
          Issue Type: Improvement
    Affects Versions: 0.2.1
            Reporter: Pradeep Kamath


Currently Pig interprets the result type of a relation as a bag. However the schema of the relation directly contains the schema describing the fields in the tuples for the relation. However when a udf wants to return a bag or if there is a bag in input data or if the user creates a bag constant, the schema of the bag has one field schema which is that of the tuple. The Tuple's schema has the types of the fields. To be able to access the fields from the bag directly in such a case by using something like <bagname>.<fieldname> or <bag>.<fieldposition>, the schema of the bag should have the twoLevelAccess set to true so that pig's type system can get traverse the tuple schema and get to the field in question. This is confusing - we should try and see if we can avoid needing this extra flag. A possible solution is to treat bags the same way - whether they represent relations or real bags. Another way is to introduce a special "relation" datatype for the result type of a relation and bag type would be used only for true bags. In this case, we would always need bag schema to have a tuple schema which would describe the fields. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-847) Setting twoLevelAccessRequired field in a bag schema should not be required to access fields in the tuples of the bag

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Ding reassigned PIG-847:
--------------------------------

    Assignee: Richard Ding

> Setting twoLevelAccessRequired field in a bag schema should not be required to access fields in the tuples of the bag
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-847
>                 URL: https://issues.apache.org/jira/browse/PIG-847
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.1
>            Reporter: Pradeep Kamath
>            Assignee: Richard Ding
>
> Currently Pig interprets the result type of a relation as a bag. However the schema of the relation directly contains the schema describing the fields in the tuples for the relation. However when a udf wants to return a bag or if there is a bag in input data or if the user creates a bag constant, the schema of the bag has one field schema which is that of the tuple. The Tuple's schema has the types of the fields. To be able to access the fields from the bag directly in such a case by using something like <bagname>.<fieldname> or <bag>.<fieldposition>, the schema of the bag should have the twoLevelAccess set to true so that pig's type system can get traverse the tuple schema and get to the field in question. This is confusing - we should try and see if we can avoid needing this extra flag. A possible solution is to treat bags the same way - whether they represent relations or real bags. Another way is to introduce a special "relation" datatype for the result type of a relation and bag type would be used only for true bags. In this case, we would always need bag schema to have a tuple schema which would describe the fields. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-847) Setting twoLevelAccessRequired field in a bag schema should not be required to access fields in the tuples of the bag

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates reassigned PIG-847:
------------------------------

    Assignee: Alan Gates  (was: Richard Ding)

> Setting twoLevelAccessRequired field in a bag schema should not be required to access fields in the tuples of the bag
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-847
>                 URL: https://issues.apache.org/jira/browse/PIG-847
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.3.0
>            Reporter: Pradeep Kamath
>            Assignee: Alan Gates
>
> Currently Pig interprets the result type of a relation as a bag. However the schema of the relation directly contains the schema describing the fields in the tuples for the relation. However when a udf wants to return a bag or if there is a bag in input data or if the user creates a bag constant, the schema of the bag has one field schema which is that of the tuple. The Tuple's schema has the types of the fields. To be able to access the fields from the bag directly in such a case by using something like <bagname>.<fieldname> or <bag>.<fieldposition>, the schema of the bag should have the twoLevelAccess set to true so that pig's type system can get traverse the tuple schema and get to the field in question. This is confusing - we should try and see if we can avoid needing this extra flag. A possible solution is to treat bags the same way - whether they represent relations or real bags. Another way is to introduce a special "relation" datatype for the result type of a relation and bag type would be used only for true bags. In this case, we would always need bag schema to have a tuple schema which would describe the fields. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-847) Setting twoLevelAccessRequired field in a bag schema should not be required to access fields in the tuples of the bag

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-847:
---------------------------

    Fix Version/s: 0.9.0

> Setting twoLevelAccessRequired field in a bag schema should not be required to access fields in the tuples of the bag
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-847
>                 URL: https://issues.apache.org/jira/browse/PIG-847
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.3.0
>            Reporter: Pradeep Kamath
>            Assignee: Alan Gates
>             Fix For: 0.9.0
>
>
> Currently Pig interprets the result type of a relation as a bag. However the schema of the relation directly contains the schema describing the fields in the tuples for the relation. However when a udf wants to return a bag or if there is a bag in input data or if the user creates a bag constant, the schema of the bag has one field schema which is that of the tuple. The Tuple's schema has the types of the fields. To be able to access the fields from the bag directly in such a case by using something like <bagname>.<fieldname> or <bag>.<fieldposition>, the schema of the bag should have the twoLevelAccess set to true so that pig's type system can get traverse the tuple schema and get to the field in question. This is confusing - we should try and see if we can avoid needing this extra flag. A possible solution is to treat bags the same way - whether they represent relations or real bags. Another way is to introduce a special "relation" datatype for the result type of a relation and bag type would be used only for true bags. In this case, we would always need bag schema to have a tuple schema which would describe the fields. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.