You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Koji Noguchi (JIRA)" <ji...@apache.org> on 2017/12/09 06:47:00 UTC
[jira] [Resolved] (PIG-2537) Output from flatten with a null tuple
input generating data inconsistent with the schema
[ https://issues.apache.org/jira/browse/PIG-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Noguchi resolved PIG-2537.
-------------------------------
Resolution: Duplicate
Fix Version/s: (was: 0.18.0)
Closing this jira as a duplicate of PIG-5201.
> Output from flatten with a null tuple input generating data inconsistent with the schema
> ----------------------------------------------------------------------------------------
>
> Key: PIG-2537
> URL: https://issues.apache.org/jira/browse/PIG-2537
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.8.0, 0.9.0
> Reporter: Xuefu Zhang
> Assignee: Daniel Dai
> Attachments: PIG-2537-1.patch, PIG-2537-2.patch, PIG-2537-3.patch
>
>
> For the following pig script,
> grunt> A = load 'file' as ( a : tuple( x, y, z ), b, c );
> grunt> B = foreach A generate flatten( $0 ), b, c;
> grunt> describe B;
> B: {a::x: bytearray,a::y: bytearray,a::z: bytearray,b: bytearray,c: bytearray}
> Alias B has a clear schema.
> However, on the backend, for a row if $0 happens to be null, then output tuple become something like
> (null, b_value, c_value), which is obviously inconsistent with the schema. The behaviour is confirmed by pig code inspection.
> This inconsistency corrupts data because of position shifts. Expected output row should be something like
> (null, null, null, b_value, c_value).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)