You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2008/09/11 07:32:45 UTC

[jira] Updated: (PIG-364) Limit return incorrect records when we use multiple reducer

     [ https://issues.apache.org/jira/browse/PIG-364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-364:
---------------------------

    Attachment: PIG-364.patch

This patch takes approach 1. It will add one additional map-reduce operator with 1 reducer if the requested parallelism > 1. Now the behavior of limit is:

1. If the map plan is closed before POLimit operator, we put POLimit in reduce plan, grant requested parallelism, if requested parallelism > 1, close reduce plan, add one additional map-reduce operator with 1 reducer

2. If the map plan is open before POLimit operator, we put POLimit in map plan, close map plan, add another POLimit to reduce plan, and set parallelism of this map-reduce operator 1. Although in this case, POLimit create a map-reduce boundary, we do not associate a parallel option with limit keyword. I believe provide a parallel option with limit will arouse confusion to the user, because it is relatively hard to explain to the user whether this parallel option will be granted or not

3. In limited sort case, we will have POSort with limit<>-1. If the parallelism for POSort > 1, we add one additional map-reduce operator with 1 reducer


> Limit return incorrect records when we use multiple reducer
> -----------------------------------------------------------
>
>                 Key: PIG-364
>                 URL: https://issues.apache.org/jira/browse/PIG-364
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: types_branch
>
>         Attachments: PIG-364.patch
>
>
> Currently we put Limit(k) operator in the reducer plan. However, in the case of n reducer, we will get up to n*k output. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.