You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Gianmarco De Francisci Morales (Commented) (JIRA)" <ji...@apache.org> on 2011/11/02 13:49:32 UTC

[jira] [Commented] (PIG-1660) Consider passing result of COUNT/COUNT_STAR to LIMIT

    [ https://issues.apache.org/jira/browse/PIG-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142096#comment-13142096 ] 

Gianmarco De Francisci Morales commented on PIG-1660:
-----------------------------------------------------

I tested the script (correcting some small mistakes) and PIG-1926 actually solves it.
The tests in TestLimitVariable actually test for the same features, but are actually e2e tests so they would stay better in their right place among e2e tests.
I can port them.
2 questions, should I open a jira for this, and should we remove the java unit tests?
                
> Consider passing result of COUNT/COUNT_STAR to LIMIT 
> -----------------------------------------------------
>
>                 Key: PIG-1660
>                 URL: https://issues.apache.org/jira/browse/PIG-1660
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Viraj Bhat
>
> In realistic scenarios we need to split a dataset into segments by using LIMIT, and like to achieve that goal within the same pig script. Here is a case:
> {code}
> A = load '$DATA' using PigStorage(',') as (id, pvs);
> B = group A by ALL;
> C = foreach B generate COUNT_STAR(A) as row_cnt;
> -- get the low 50% segment
> D = order A by pvs;
> E = limit D (C.row_cnt * 0.2);
> store E in '$Eoutput';
> -- get the high 20% segment
> F = order A by pvs DESC;
> G = limit F (C.row_cnt * 0.2);
> store G in '$Goutput';
> {code}
> Since LIMIT only accepts constants, we have to split the operation to two steps in order to pass in the constants for the LIMIT statements. Please consider bringing this feature in so the processing can be more efficient.
> Viraj

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira