You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thejas M Nair (Commented) (JIRA)" <ji...@apache.org> on 2012/03/27 02:09:30 UTC

[jira] [Commented] (PIG-2537) Output from flatten with a null tuple input generating data inconsistent with the schema

    [ https://issues.apache.org/jira/browse/PIG-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239036#comment-13239036 ] 

Thejas M Nair commented on PIG-2537:
------------------------------------

Thoughts on the solution - Pig should continue to allow and expect null values for objects such as tuple. I think the problem needs to be solved in flatten, as it is the one that promises a certain schema and fails to generate data of that schema if the value is null. But this means that flatten needs to be aware of the expected schema for the tuple/bags at run time, ie the schema would need to be serialized and sent to the backend. That change would also be non backward compatible. 
                
> Output from flatten with a null tuple input generating data inconsistent with the schema
> ----------------------------------------------------------------------------------------
>
>                 Key: PIG-2537
>                 URL: https://issues.apache.org/jira/browse/PIG-2537
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Xuefu Zhang
>            Assignee: Daniel Dai
>             Fix For: 0.11
>
>         Attachments: PIG-2537-1.patch, PIG-2537-2.patch, PIG-2537-3.patch
>
>
> For the following pig script,
> grunt> A = load 'file' as ( a : tuple( x, y, z ), b, c );
> grunt> B = foreach A generate flatten( $0 ), b, c;
> grunt> describe B;
> B: {a::x: bytearray,a::y: bytearray,a::z: bytearray,b: bytearray,c: bytearray}
> Alias B has a clear schema.
> However, on the backend, for a row if $0 happens to be null, then output tuple become something like 
> (null, b_value, c_value), which is obviously inconsistent with the schema. The behaviour is confirmed by pig code inspection. 
> This inconsistency corrupts data because of position shifts. Expected output row should be something like
> (null, null, null, b_value, c_value).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira