You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Koji Noguchi (JIRA)" <ji...@apache.org> on 2017/06/20 21:11:09 UTC

[jira] [Commented] (PIG-5201) Null handling on FLATTEN

    [ https://issues.apache.org/jira/browse/PIG-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056468#comment-16056468 ] 

Koji Noguchi commented on PIG-5201:
-----------------------------------

bq.  Have asked Koji Noguchi to check with couple of internal users who are both Pig and data pipeline experts and will be affected by this.

From the users, learned that there's a common pattern users use which can easily break when FLATTEN(null-bag) start dropping records as I proposed... 

Basically their code looks like
{code}
...
C = FOREACH B GENERATE record_type, FLATTEN(type_a_bag), FLATTEN(type_b_bag); 
...
{code}
When record_type is 'a', type_b_bag is null, and vice-versa. 
Instead of checking the record_type up-front, user simply flatten both and later examine the record_type.

I hate inconsistency and I hate being wrong (and Rohini being right), but it looks like I would have to keep the current behavior of FLATTEN(null-bag) _not_ dropping.  

> Null handling on FLATTEN
> ------------------------
>
>                 Key: PIG-5201
>                 URL: https://issues.apache.org/jira/browse/PIG-5201
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Minor
>             Fix For: 0.18.0
>
>         Attachments: pig-5201-v00-testonly.patch, pig-5201-v01.patch, pig-5201-v02.patch, pig-5201-v03.patch
>
>
> Sometimes, FLATTEN(null) or FLATTEN(bag-with-null) seem to produce incorrect results.
> Test code/script to follow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)