You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org> on 2011/09/21 22:49:11 UTC
[jira] [Commented] (PIG-2237) LIMIT generates wrong number of
records if pig determines no of reducers as more than 1
[ https://issues.apache.org/jira/browse/PIG-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109866#comment-13109866 ]
jiraposter@reviews.apache.org commented on PIG-2237:
----------------------------------------------------
bq. On 2011-08-30 00:15:01, Dmitriy Ryaboy wrote:
bq. > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRUtil.java, line 36
bq. > <https://reviews.apache.org/r/1664/diff/3/?file=36244#file36244line36>
bq. >
bq. > will using this mess up projection push-down?
This function only used in map-reduce layer. "projection push-down" is in logical layer. They should not interfere each other. What's your concern?
I will address all other comments (Actually they all come from original code I restructured :) )
- Daniel
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1664/#review1684
-----------------------------------------------------------
On 2011-08-29 23:34:23, Daniel Dai wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/1664/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-08-29 23:34:23)
bq.
bq.
bq. Review request for pig and Thejas Nair.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. See PIG-2237
bq.
bq.
bq. This addresses bug PIG-2237.
bq. https://issues.apache.org/jira/browse/PIG-2237
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/LimitAdjuster.java PRE-CREATION
bq. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java 1162260
bq. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRUtil.java PRE-CREATION
bq. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1162260
bq. trunk/test/org/apache/pig/test/TestEvalPipeline2.java 1162260
bq. trunk/test/org/apache/pig/test/TestMRCompiler.java 1162260
bq.
bq. Diff: https://reviews.apache.org/r/1664/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Test-patch:
bq. [exec] +1 overall.
bq. [exec]
bq. [exec] +1 @author. The patch does not contain any @author tags.
bq. [exec]
bq. [exec] +1 tests included. The patch appears to include 3 new or modified tests.
bq. [exec]
bq. [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
bq. [exec]
bq. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
bq. [exec]
bq. [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
bq. [exec]
bq. [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
bq.
bq. Unit test:
bq. all pass.
bq.
bq.
bq. Thanks,
bq.
bq. Daniel
bq.
bq.
> LIMIT generates wrong number of records if pig determines no of reducers as more than 1
> ---------------------------------------------------------------------------------------
>
> Key: PIG-2237
> URL: https://issues.apache.org/jira/browse/PIG-2237
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0, 0.9.0
> Reporter: Anitha Raju
> Assignee: Daniel Dai
> Fix For: 0.9.1, 0.10
>
> Attachments: PIG-2237-1.patch, PIG-2237-2.patch, PIG-2237-3.patch
>
>
> Hi,
> For a script
> ========
> A = load 'test.txt' using PigStorage() as (a:int,b:int);
> B = order A by a ;
> C = limit B 2;
> store C into 'op1' using PigStorage();
> ========
> Limit and ORDER BY are done in the same MR job if no explicit PARALLELism is mentioned.
> In this case, the no of reducers are determined by pig and sometimes it is calculated > 1.
> Since limit happens at the reduce side, each reduce tasks does a limit separately generating n*2 records where n is the no of reduce tasks calculated by pig.
> If an explicit specification of no of reduce tasks using PARALLEL keyword is done on ORDER BY,
> ==========
> B = order A by a PARALLEL 4;
> ==========
> another MR is created with 1 reduce task where the limit is done.
> In short, the issue occurs when the no of reducers calculated by pig is greater than 1 and a limit is involved in the MR.
> The issue can be replicated by specifying
> ==========
> -Dpig.exec.reducers.bytes.per.reducer
> ==========
> The issue is seen in 0.8 and 0.9 version. It works good in 0.7
> Regards,
> Anitha
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira