You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Arun C Murthy (JIRA)" <ji...@apache.org> on 2008/04/02 08:28:24 UTC

[jira] Resolved: (PIG-169) Enhance PigStorage to handle complicated Tuples (i.e. automatically flatten them)

     [ https://issues.apache.org/jira/browse/PIG-169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy resolved PIG-169.
-------------------------------

    Resolution: Won't Fix

Currently there isn't infrastructure to follow a given alias up the logical tree and check if it is a result of a GROUP and further check if hasn't been flattened etc., so marking this as *won't fix*.

> Enhance PigStorage to handle complicated Tuples (i.e. automatically flatten them)
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-169
>                 URL: https://issues.apache.org/jira/browse/PIG-169
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>
> Currently PigStorage (actually Tuple.toDelimitedString) only handles the simple case of straight DataAtoms as fields and borks if it has any other Datum as a field. It would be nice to enhance it to handle the more complicated cases too. Currently users _have to_ use a *flatten* to convert these to simpler Tuples which can be then handled by PigStorage.
> ----
> On a related note, there is an interesting caveat with GROUP/COGROUP operators... they result in tuples with the first field which has the name 'group', whose value on which the grouping has been performed. 
> E.g.
> Input:
>  <A, 1>
>  <A, 2>
> Pig script:
>  INPUT = load 'input';
>  A = group INPUT by $0;
>  B = stream A through `script`;
> Results in A being: 
> (A, {(A, 1), (A, 2)})
> Now, if PigStorage _auto-flattens_ A it results in:
>  (A, A, 1)
>  (A, A, 2)
> However, user expectation is probably the straight-forward:
>  (A, 1)
>  (A, 2)
> ---
> Alan suggested that we could use the LOVisitor infrastructure to visit nodes in the tree, save up information (i.e. that a GROUP/COGROUP occured) and then use that information to get PigStorage to 'skip' the group field while auto-flattening. However it might have to done if, and only if, PigStorage is auto-flattening tuples directly coming from a GROUP/COGROUP operator i.e. doesn't have other EvalSpecs working on those tuples ...
> ---
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.