You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2011/03/28 15:03:05 UTC

[jira] [Resolved] (PIG-1693) support project-range expression. (was: There needs to be a way in foreach to indicate "and all the rest of the fields" )

     [ https://issues.apache.org/jira/browse/PIG-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thejas M Nair resolved PIG-1693.
--------------------------------

      Resolution: Fixed
    Release Note: 

Project-range ( '..' ) can be used to project a range of columns from input. 
For example, the expressions - 
..$x  : projects columns $0 through $x, inclusive
$x..  : projects columns through end, inclusive
$x..$y : projects columns through $y, inclusive
If the input relation has a schema, you can also use column aliases instead of referring to columns using position. You can also combine the use of alias and column positions in a project-range expression (ie, "col1 .. $5"  is valid).


This expression can be used in all cases where the use of '*' (project-star) is allowed, except as a udf argument. Support for that use case will be added in PIG-1938.

It can be used in following statements -
- foreach 
- join
- order (also when it is within a nested foreach block)
- group/cogroup

Examples - 
{code}
grunt> F = foreach IN generate (int)col0, col1 .. col3;      
grunt> describe F;                                           
F: {col0: int,col1: bytearray,col2: bytearray,col3: bytearray}
{code}
{code}
grunt> SORT = order IN by col2 .. col3, col0, col4 ..;
{code}
{code}
J = join IN1 by  $0 .. $3,  IN2 by $0 .. $3;
{code}
{code}
g = group l1 by  b .. c;
{code}

Limitations:
There are some restrictions on the use of project-to-end form of project range (eg "x .. ") when input schema is null (unknown). These are also cases where the use of project-star ('*') is restricted.

1.  In Cogroup/Group statements, project-to-end form of project-range is only allowed if the input has a schema

2. In order-by statement, project-to-end form of project-range is supported only as last sort column, if input schema is null.
Note: there is a bug PIG-1939, because of which the use is restricted when schema is present. That should be fixed soon.
example-
{code}
grunt> describe IN;
Schema for IN unknown.

-- Following statement is supported
SORT = order IN by $2 .. $3, $6 ..;

-- Following statement is NOT supported
SORT = order IN by $2 .. $3, $6 ..;
{code}



Patch committed to trunk.

> support project-range expression. (was: There needs to be a way in foreach to indicate "and all the rest of the fields" )
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1693
>                 URL: https://issues.apache.org/jira/browse/PIG-1693
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Alan Gates
>            Assignee: Thejas M Nair
>             Fix For: 0.9.0
>
>         Attachments: PIG-1693.1.patch, PIG-1693.2.patch
>
>
> A common use case we see in Pig is people have many columns in their data and they only want to operate on a few of them.  Consider for example if before storing data with ten columns, the user wants to perform a cast on one column:
> {code}
> ...
> Z = foreach Y generate (int)firstcol, secondcol, thridcol, forthcol, fifthcol, sixthcol, seventhcol, eigthcol, ninethcol, tenthcol;
> store Z into 'output';
> {code}
> Obviously this only gets worse as the user has more columns.  Ideally the above could be transformed to something like:
> {code}
> ...
> Z = foreach Y generate (int)firstcol, "and all the rest";
> store Z into 'output'
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira