You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Vivek Padmanabhan (Commented) (JIRA)" <ji...@apache.org> on 2011/11/15 10:57:52 UTC
[jira] [Commented] (PIG-1324) Logical Optimizer: Nested column
pruning
[ https://issues.apache.org/jira/browse/PIG-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150337#comment-13150337 ]
Vivek Padmanabhan commented on PIG-1324:
----------------------------------------
We are still seeing this with Pig 0.9 as well.
a = load 'input' using PigStorage(',') as (act:chararray,f2:int,bcookie:chararray,f4:long);
b1 = filter common_data by act == 'aaa';
g1 = group b1 all;
c1 = foreach g1 generate COUNT(b1.bcookie);
store c1 into 'deleteme_junktest01';
Or
c1 = foreach g1 {
uniq = distinct b1.bcookie;
generate COUNT(uniq);
};
Since most of the scripts in our clusters uses complex data types, solving this would be of great help.
> Logical Optimizer: Nested column pruning
> ----------------------------------------
>
> Key: PIG-1324
> URL: https://issues.apache.org/jira/browse/PIG-1324
> Project: Pig
> Issue Type: Sub-task
> Components: impl
> Affects Versions: 0.7.0
> Reporter: Daniel Dai
> Assignee: Daniel Dai
>
> Currently, column pruning does not prune sub-fields inside a complex data-type. For example:
> A = load '1.txt' as (a0, a1, a2);
> B = group A by a0;
> C = foreach B generate group, SUM(A.a1);
> Currently, since we group A as a bag, and some part of the bag is used in the following statement, so none of the fields inside A can be pruned. We shall keep track of sub-fields and figure out a2 is not actually needed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira