You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2011/01/18 03:36:44 UTC

[jira] Assigned: (PIG-1711) Document BinStorage behaviour

     [ https://issues.apache.org/jira/browse/PIG-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich reassigned PIG-1711:
-----------------------------------

    Assignee: Olga Natkovich

> Document BinStorage behaviour 
> ------------------------------
>
>                 Key: PIG-1711
>                 URL: https://issues.apache.org/jira/browse/PIG-1711
>             Project: Pig
>          Issue Type: Bug
>          Components: documentation
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Viraj Bhat
>            Assignee: Olga Natkovich
>             Fix For: 0.9.0
>
>
> We need to document some features of BinStorage that can cause indeterminate results.
> I have a Pig script of this type:
> {code}
> raw = load 'sampledata' using BinStorage() as (col1,col2, col3);
> --filter out null columns
> A = filter raw by col1#'bcookie' is not null;
> B = foreach A generate col1#'bcookie'  as reqcolumn;
> describe B;
> --B: {regcolumn: bytearray}
> X = limit B 5;
> dump X;
> B = foreach A generate (chararray)col1#'bcookie'  as convertedcol;
> describe B;
> --B: {convertedcol: chararray}
> X = limit B 5;
> dump X;
> {code}
> The first dump produces:
> (36co9b55onr8s)
> (36co9b55onr8s)
> (36hilul5oo1q1)
> (36hilul5oo1q1)
> (36l4cj15ooa8a)
> The second dump produces:
> ()
> ()
> ()
> ()
> ()
> So we need to write correct documentation on why this happens. One good explanation seems to be:
> According to Alan:
> BinStorage should not track data lineage. In the case where Pig is using BinStorage (or whatever) for moving data between MR jobs then Pig can figure out the correct cast function to use and apply it. For cases such as the one here where users are storing data using BinStorage and then in a separate Pig Latin script reading it (and thus loosing the type information) it is the users responsibility to correctly cast the data before storing it in BinStorage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.