You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2011/03/04 02:18:37 UTC

[jira] Commented: (PIG-1693) There needs to be a way in foreach to indicate "and all the rest of the fields"

    [ https://issues.apache.org/jira/browse/PIG-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002384#comment-13002384 ] 

Thejas M Nair commented on PIG-1693:
------------------------------------

bq. If this doesn't work with named aliases, its almost useless for me. Numbered references are not maintainable,
Alan's proposal in his comment dated '26/Oct/10 16:27' works with named aliases as well.
I am planning to go work on that proposal. 

The use of "*" is supported in cogroup, order-by and join statements as well, so I am planning to keep it consistent and support this syntax in those statements as well. 

bq. *+ would mean "all columns not referenced"
In this initial implementation I am planning to support only 'all columns in range'. If there is enough interest for 'all columns not referenced' feature that can be added later.

> There needs to be a way in foreach to indicate "and all the rest of the fields"
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1693
>                 URL: https://issues.apache.org/jira/browse/PIG-1693
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Alan Gates
>            Assignee: Thejas M Nair
>             Fix For: 0.9.0
>
>
> A common use case we see in Pig is people have many columns in their data and they only want to operate on a few of them.  Consider for example if before storing data with ten columns, the user wants to perform a cast on one column:
> {code}
> ...
> Z = foreach Y generate (int)firstcol, secondcol, thridcol, forthcol, fifthcol, sixthcol, seventhcol, eigthcol, ninethcol, tenthcol;
> store Z into 'output';
> {code}
> Obviously this only gets worse as the user has more columns.  Ideally the above could be transformed to something like:
> {code}
> ...
> Z = foreach Y generate (int)firstcol, "and all the rest";
> store Z into 'output'
> {code}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira