You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Min Zhou (Commented) (JIRA)" <ji...@apache.org> on 2011/11/24 08:36:40 UTC

[jira] [Commented] (PIG-1270) Push limit into loader

    [ https://issues.apache.org/jira/browse/PIG-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156580#comment-13156580 ] 

Min Zhou commented on PIG-1270:
-------------------------------

@Daniel

It improves here, but with a bug. I did the test in a 25-nodes cluster which such script
{noformat}
A = load '/tpch/orders' USING PigStorage('\u0001') AS (o_orderkey:int, o_custkey:int, o_orderstatus:chararray, o_totalprice:double, o_orderdate:chararray, o_orderpriority:chararray, o_clerk:chararray, o_shippriority:int, o_comment: chararray);
F = FOREACH A GENERATE o_orderkey;
L = LIMIT F 10;
DUMP L; 
{noformat}

||case||job cost time||HDFS bytes read||Average time taken by Map tasks|Worst performing map task
|w/o optimization| 26 sec|12,976,128|1 sec | 1 sec|
|with optimization| 24 sec|19,347,931,305|3 sec | 5 sec|

Since with your patch, the LimitOptimizer would remove LOLimit from logic plans after set the limit to LOLoad, this would generate a map-only job. Record number of the result would be map_num * 10, this is incorrect. 

I will submit a patch soon.





                
> Push limit into loader
> ----------------------
>
>                 Key: PIG-1270
>                 URL: https://issues.apache.org/jira/browse/PIG-1270
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>         Attachments: PIG-1270-1.patch, PIG-1270-2.patch
>
>
> We can optimize limit operation by stopping early in PigRecordReader. In general, we need a way to communicate between PigRecordReader and execution pipeline. POLimit could instruct PigRecordReader that we have already had enough records and stop feeding more data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira