You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Jie Li (JIRA)" <ji...@apache.org> on 2012/06/28 20:45:45 UTC

[jira] [Commented] (PIG-2675) Optimization: Remove unnecessary Limit jobs from plan

    [ https://issues.apache.org/jira/browse/PIG-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403351#comment-13403351 ] 

Jie Li commented on PIG-2675:
-----------------------------

Limit is now always compiled to two jobs. We can optimize at both compile-time and runtime.

{code}
data = LOAD 'queries/1.txt' AS (k, v, x);
selected = LIMIT data 2;
explain selected;
{code}

For this query, LIMIT is compiled at both the map phase and reduce phase in the 1st job, whose requestedParallelism is already set to 1, thus we don't need to compile the 2nd job.

{code}
data = LOAD 'queries/1.txt' AS (k, v, x);
grouped = GROUP data BY k;
selected = LIMIT grouped 2;
explain selected;
{code}

For this query, LIMIT is compiled at the reduce phase of the 1st job, therefore we need to compile a 2nd job, which can be skipped at run-time.

                
> Optimization: Remove unnecessary Limit jobs from plan
> -----------------------------------------------------
>
>                 Key: PIG-2675
>                 URL: https://issues.apache.org/jira/browse/PIG-2675
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Daniel Dai
>
> LIMIT operator always inserts a limiting single-reducer job after PIG-2652.
> We can optimize this job away when the preceding job only has 1 reducer at run-time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira