You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Ashutosh Chauhan (JIRA)" <ji...@apache.org> on 2010/07/12 08:50:51 UTC

[jira] Commented: (PIG-1249) Safe-guards against misconfigured Pig scripts without PARALLEL keyword

    [ https://issues.apache.org/jira/browse/PIG-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887283#action_12887283 ] 

Ashutosh Chauhan commented on PIG-1249:
---------------------------------------

Map-reduce framework has a jira related to this issue.  https://issues.apache.org/jira/browse/MAPREDUCE-1521 It has two implications for Pig:

1) We need to reconsider whether we still want Pig to set number of reducers on user's behalf. We can choose not to "intelligently" choose # of reducers and let framework fail the  job which doesn't "correctly" specify # of reducers. Then, Pig is out of this guessing game and users are forced by framework to correctly specify # of reducers. 

2) Now that MR framework will fail the job based on configured limits, operators where Pig does compute and set number of reducers (like skewed join etc.) should now be aware of those limits so that # of reducers computed by them fall within those limits.

> Safe-guards against misconfigured Pig scripts without PARALLEL keyword
> ----------------------------------------------------------------------
>
>                 Key: PIG-1249
>                 URL: https://issues.apache.org/jira/browse/PIG-1249
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.8.0
>            Reporter: Arun C Murthy
>            Assignee: Jeff Zhang
>            Priority: Critical
>             Fix For: 0.8.0
>
>         Attachments: PIG-1249-4.patch, PIG-1249.patch, PIG_1249_2.patch, PIG_1249_3.patch
>
>
> It would be *very* useful for Pig to have safe-guards against naive scripts which process a *lot* of data without the use of PARALLEL keyword.
> We've seen a fair number of instances where naive users process huge data-sets (>10TB) with badly mis-configured #reduces e.g. 1 reduce. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.