You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by jiang licht <li...@yahoo.com> on 2010/08/26 00:52:24 UTC

Pig restricts size of mapper/reducer output?

Is there a way to tell Pig to restrict the size of map/reduce output that can be saved to dfs? E.g. if a job creates over-limit data, it won't be allowed to save the result to the dfs and the job will fail. 

This will help to prevent unexpected huge data from being saved to dfs by mapper/reducer created by a Pig script. This means we have an estimate of how much data will be generated by a Pig script in advance. Then, with this quota, if over-sized result is generated, it won't be saved and the job fails.

Thanks,
Michael


      

Re: Pig restricts size of mapper/reducer output?

Posted by Alan Gates <ga...@yahoo-inc.com>.
HDFS supports quotas, so you can control it that way, but obviously  
this will affect all your HDFS users not just Pig scripts.

Alan.

On Aug 25, 2010, at 3:52 PM, jiang licht wrote:

> Is there a way to tell Pig to restrict the size of map/reduce output  
> that can be saved to dfs? E.g. if a job creates over-limit data, it  
> won't be allowed to save the result to the dfs and the job will fail.
>
> This will help to prevent unexpected huge data from being saved to  
> dfs by mapper/reducer created by a Pig script. This means we have an  
> estimate of how much data will be generated by a Pig script in  
> advance. Then, with this quota, if over-sized result is generated,  
> it won't be saved and the job fails.
>
> Thanks,
> Michael
>
>