You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2011/03/28 15:03:05 UTC
[jira] [Resolved] (PIG-1693) support project-range expression.
(was: There needs to be a way in foreach to indicate "and all the rest of
the fields" )
[ https://issues.apache.org/jira/browse/PIG-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thejas M Nair resolved PIG-1693.
--------------------------------
Resolution: Fixed
Release Note:
Project-range ( '..' ) can be used to project a range of columns from input.
For example, the expressions -
..$x : projects columns $0 through $x, inclusive
$x.. : projects columns through end, inclusive
$x..$y : projects columns through $y, inclusive
If the input relation has a schema, you can also use column aliases instead of referring to columns using position. You can also combine the use of alias and column positions in a project-range expression (ie, "col1 .. $5" is valid).
This expression can be used in all cases where the use of '*' (project-star) is allowed, except as a udf argument. Support for that use case will be added in PIG-1938.
It can be used in following statements -
- foreach
- join
- order (also when it is within a nested foreach block)
- group/cogroup
Examples -
{code}
grunt> F = foreach IN generate (int)col0, col1 .. col3;
grunt> describe F;
F: {col0: int,col1: bytearray,col2: bytearray,col3: bytearray}
{code}
{code}
grunt> SORT = order IN by col2 .. col3, col0, col4 ..;
{code}
{code}
J = join IN1 by $0 .. $3, IN2 by $0 .. $3;
{code}
{code}
g = group l1 by b .. c;
{code}
Limitations:
There are some restrictions on the use of project-to-end form of project range (eg "x .. ") when input schema is null (unknown). These are also cases where the use of project-star ('*') is restricted.
1. In Cogroup/Group statements, project-to-end form of project-range is only allowed if the input has a schema
2. In order-by statement, project-to-end form of project-range is supported only as last sort column, if input schema is null.
Note: there is a bug PIG-1939, because of which the use is restricted when schema is present. That should be fixed soon.
example-
{code}
grunt> describe IN;
Schema for IN unknown.
-- Following statement is supported
SORT = order IN by $2 .. $3, $6 ..;
-- Following statement is NOT supported
SORT = order IN by $2 .. $3, $6 ..;
{code}
Patch committed to trunk.
> support project-range expression. (was: There needs to be a way in foreach to indicate "and all the rest of the fields" )
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: PIG-1693
> URL: https://issues.apache.org/jira/browse/PIG-1693
> Project: Pig
> Issue Type: New Feature
> Components: impl
> Reporter: Alan Gates
> Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-1693.1.patch, PIG-1693.2.patch
>
>
> A common use case we see in Pig is people have many columns in their data and they only want to operate on a few of them. Consider for example if before storing data with ten columns, the user wants to perform a cast on one column:
> {code}
> ...
> Z = foreach Y generate (int)firstcol, secondcol, thridcol, forthcol, fifthcol, sixthcol, seventhcol, eigthcol, ninethcol, tenthcol;
> store Z into 'output';
> {code}
> Obviously this only gets worse as the user has more columns. Ideally the above could be transformed to something like:
> {code}
> ...
> Z = foreach Y generate (int)firstcol, "and all the rest";
> store Z into 'output'
> {code}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira