You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2010/01/14 21:23:55 UTC

[jira] Commented: (PIG-1188) Padding nulls to the input tuple according to input schema

    [ https://issues.apache.org/jira/browse/PIG-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800342#action_12800342 ] 

Alan Gates commented on PIG-1188:
---------------------------------

I don't think padding is a good idea.  We don't know which field in the record is missing.  We're just guessing that the last field is missing, when in fact it might be the first.  Then we've made the situation worse by inserting invalid data in the all the fields.

I think the loader should either throw the record out, or make all fields in the record null.  This guarantees that we are not further propagating the error.  Then a warning can be issued that the record was invalid (I'm assuming even in the above proposal the loader would issue a warning.) 

> Padding nulls to the input tuple according to input schema
> ----------------------------------------------------------
>
>                 Key: PIG-1188
>                 URL: https://issues.apache.org/jira/browse/PIG-1188
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Daniel Dai
>             Fix For: 0.7.0
>
>
> Currently, the number of fields in the input tuple is determined by the data. When we have schema, we should generate input data according to the schema, and padding nulls if necessary. Here is one example:
> Pig script:
> {code}
> a = load '1.txt' as (a0, a1);
> dump a;
> {code}
> Input file:
> {code}
> 1       2
> 1       2       3
> 1
> {code}
> Current result:
> {code}
> (1,2)
> (1,2,3)
> (1)
> {code}
> Desired result:
> {code}
> (1,2)
> (1,2)
> (1, null)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.