You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Pradeep Kamath (JIRA)" <ji...@apache.org> on 2010/04/12 19:57:53 UTC

[jira] Created: (PIG-1371) Pig should handle deep casting of complex types

Pig should handle deep casting of complex types 
------------------------------------------------

                 Key: PIG-1371
                 URL: https://issues.apache.org/jira/browse/PIG-1371
             Project: Pig
          Issue Type: Bug
            Reporter: Pradeep Kamath
             Fix For: 0.8.0


Consider input data in BinStorage format which has a field of bag type - bg:{t:(i:int)}. In the load statement if the schema specified has the type for this field specified as bg:{t:(c:chararray)}, the current behavior is that Pig thinks of the field to be of type specified in the load statement (bg:{t:(c:chararray)}) but no deep cast from bag of int (the real data) to bag of chararray (the user specified schema) is made.

There are two issues currently:
1) The TypeCastInserter only considers the byte 'type' between the loader presented schema and user specified schema to decided whether to introduce a cast or not. In the above case since both schema have the type "bag" no cast is inserted. This check has to be extended to consider the full FieldSchema (with inner subschema) in order to decide whether a cast is needed.
2) POCast should be changed to handle casting a complex type to the type specified the user supplied FieldSchema. Here is there is one issue to be considered - if the user specified the cast type to be bg:{t:(i:int, j:int)} and the real data had only one field what should the result of the cast be:
 * A bag with two fields - the int field and a null? - In this approach pig is assuming the lone field in the data is the first field which might be incorrect if it in fact is the second field.
 * A null bag to indicate that the bag is of unknown value - this is the one I personally prefer
 * The cast throws an IncompatibleCastException


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (PIG-1371) Pig should handle deep casting of complex types

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-1371:
--------------------------------

    Attachment: PIG-1371-partial.patch

partial patch - attaching here for future reference

> Pig should handle deep casting of complex types 
> ------------------------------------------------
>
>                 Key: PIG-1371
>                 URL: https://issues.apache.org/jira/browse/PIG-1371
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Pradeep Kamath
>             Fix For: 0.8.0
>
>         Attachments: PIG-1371-partial.patch
>
>
> Consider input data in BinStorage format which has a field of bag type - bg:{t:(i:int)}. In the load statement if the schema specified has the type for this field specified as bg:{t:(c:chararray)}, the current behavior is that Pig thinks of the field to be of type specified in the load statement (bg:{t:(c:chararray)}) but no deep cast from bag of int (the real data) to bag of chararray (the user specified schema) is made.
> There are two issues currently:
> 1) The TypeCastInserter only considers the byte 'type' between the loader presented schema and user specified schema to decided whether to introduce a cast or not. In the above case since both schema have the type "bag" no cast is inserted. This check has to be extended to consider the full FieldSchema (with inner subschema) in order to decide whether a cast is needed.
> 2) POCast should be changed to handle casting a complex type to the type specified the user supplied FieldSchema. Here is there is one issue to be considered - if the user specified the cast type to be bg:{t:(i:int, j:int)} and the real data had only one field what should the result of the cast be:
>  * A bag with two fields - the int field and a null? - In this approach pig is assuming the lone field in the data is the first field which might be incorrect if it in fact is the second field.
>  * A null bag to indicate that the bag is of unknown value - this is the one I personally prefer
>  * The cast throws an IncompatibleCastException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1371) Pig should handle deep casting of complex types

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-1371:
----------------------------

    Fix Version/s: 0.9.0

> Pig should handle deep casting of complex types 
> ------------------------------------------------
>
>                 Key: PIG-1371
>                 URL: https://issues.apache.org/jira/browse/PIG-1371
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Pradeep Kamath
>            Assignee: Alan Gates
>             Fix For: 0.9.0
>
>         Attachments: PIG-1371-partial.patch
>
>
> Consider input data in BinStorage format which has a field of bag type - bg:{t:(i:int)}. In the load statement if the schema specified has the type for this field specified as bg:{t:(c:chararray)}, the current behavior is that Pig thinks of the field to be of type specified in the load statement (bg:{t:(c:chararray)}) but no deep cast from bag of int (the real data) to bag of chararray (the user specified schema) is made.
> There are two issues currently:
> 1) The TypeCastInserter only considers the byte 'type' between the loader presented schema and user specified schema to decided whether to introduce a cast or not. In the above case since both schema have the type "bag" no cast is inserted. This check has to be extended to consider the full FieldSchema (with inner subschema) in order to decide whether a cast is needed.
> 2) POCast should be changed to handle casting a complex type to the type specified the user supplied FieldSchema. Here is there is one issue to be considered - if the user specified the cast type to be bg:{t:(i:int, j:int)} and the real data had only one field what should the result of the cast be:
>  * A bag with two fields - the int field and a null? - In this approach pig is assuming the lone field in the data is the first field which might be incorrect if it in fact is the second field.
>  * A null bag to indicate that the bag is of unknown value - this is the one I personally prefer
>  * The cast throws an IncompatibleCastException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1371) Pig should handle deep casting of complex types

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1371:
--------------------------------

    Fix Version/s:     (was: 0.8.0)

It does not look like we will have time to do this in 0.8.0

> Pig should handle deep casting of complex types 
> ------------------------------------------------
>
>                 Key: PIG-1371
>                 URL: https://issues.apache.org/jira/browse/PIG-1371
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Pradeep Kamath
>            Assignee: Richard Ding
>         Attachments: PIG-1371-partial.patch
>
>
> Consider input data in BinStorage format which has a field of bag type - bg:{t:(i:int)}. In the load statement if the schema specified has the type for this field specified as bg:{t:(c:chararray)}, the current behavior is that Pig thinks of the field to be of type specified in the load statement (bg:{t:(c:chararray)}) but no deep cast from bag of int (the real data) to bag of chararray (the user specified schema) is made.
> There are two issues currently:
> 1) The TypeCastInserter only considers the byte 'type' between the loader presented schema and user specified schema to decided whether to introduce a cast or not. In the above case since both schema have the type "bag" no cast is inserted. This check has to be extended to consider the full FieldSchema (with inner subschema) in order to decide whether a cast is needed.
> 2) POCast should be changed to handle casting a complex type to the type specified the user supplied FieldSchema. Here is there is one issue to be considered - if the user specified the cast type to be bg:{t:(i:int, j:int)} and the real data had only one field what should the result of the cast be:
>  * A bag with two fields - the int field and a null? - In this approach pig is assuming the lone field in the data is the first field which might be incorrect if it in fact is the second field.
>  * A null bag to indicate that the bag is of unknown value - this is the one I personally prefer
>  * The cast throws an IncompatibleCastException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-1371) Pig should handle deep casting of complex types

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates reassigned PIG-1371:
-------------------------------

    Assignee: Alan Gates  (was: Richard Ding)

> Pig should handle deep casting of complex types 
> ------------------------------------------------
>
>                 Key: PIG-1371
>                 URL: https://issues.apache.org/jira/browse/PIG-1371
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Pradeep Kamath
>            Assignee: Alan Gates
>         Attachments: PIG-1371-partial.patch
>
>
> Consider input data in BinStorage format which has a field of bag type - bg:{t:(i:int)}. In the load statement if the schema specified has the type for this field specified as bg:{t:(c:chararray)}, the current behavior is that Pig thinks of the field to be of type specified in the load statement (bg:{t:(c:chararray)}) but no deep cast from bag of int (the real data) to bag of chararray (the user specified schema) is made.
> There are two issues currently:
> 1) The TypeCastInserter only considers the byte 'type' between the loader presented schema and user specified schema to decided whether to introduce a cast or not. In the above case since both schema have the type "bag" no cast is inserted. This check has to be extended to consider the full FieldSchema (with inner subschema) in order to decide whether a cast is needed.
> 2) POCast should be changed to handle casting a complex type to the type specified the user supplied FieldSchema. Here is there is one issue to be considered - if the user specified the cast type to be bg:{t:(i:int, j:int)} and the real data had only one field what should the result of the cast be:
>  * A bag with two fields - the int field and a null? - In this approach pig is assuming the lone field in the data is the first field which might be incorrect if it in fact is the second field.
>  * A null bag to indicate that the bag is of unknown value - this is the one I personally prefer
>  * The cast throws an IncompatibleCastException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-1371) Pig should handle deep casting of complex types

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich reassigned PIG-1371:
-----------------------------------

    Assignee: Richard Ding

Richard, can you take a look and see how feasible this is for 0.8.0, thanks

> Pig should handle deep casting of complex types 
> ------------------------------------------------
>
>                 Key: PIG-1371
>                 URL: https://issues.apache.org/jira/browse/PIG-1371
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Pradeep Kamath
>            Assignee: Richard Ding
>             Fix For: 0.8.0
>
>         Attachments: PIG-1371-partial.patch
>
>
> Consider input data in BinStorage format which has a field of bag type - bg:{t:(i:int)}. In the load statement if the schema specified has the type for this field specified as bg:{t:(c:chararray)}, the current behavior is that Pig thinks of the field to be of type specified in the load statement (bg:{t:(c:chararray)}) but no deep cast from bag of int (the real data) to bag of chararray (the user specified schema) is made.
> There are two issues currently:
> 1) The TypeCastInserter only considers the byte 'type' between the loader presented schema and user specified schema to decided whether to introduce a cast or not. In the above case since both schema have the type "bag" no cast is inserted. This check has to be extended to consider the full FieldSchema (with inner subschema) in order to decide whether a cast is needed.
> 2) POCast should be changed to handle casting a complex type to the type specified the user supplied FieldSchema. Here is there is one issue to be considered - if the user specified the cast type to be bg:{t:(i:int, j:int)} and the real data had only one field what should the result of the cast be:
>  * A bag with two fields - the int field and a null? - In this approach pig is assuming the lone field in the data is the first field which might be incorrect if it in fact is the second field.
>  * A null bag to indicate that the bag is of unknown value - this is the one I personally prefer
>  * The cast throws an IncompatibleCastException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.