You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Haitao Yao <ya...@gmail.com> on 2012/09/16 04:52:28 UTC

How to force the script finish the job and continue the follow script?

Hi, all
	I forgot the keyword which force Pig to finish the job and then continue the following script.
	My job failed because of OOME, so I want to split the jobs into smaller ones but still written in a single pig script(because the script is generated) .
	Is there any keywords that can achieve this?
	thanks.



Haitao Yao
yao.erix@gmail.com
weibo: @haitao_yao
Skype:  haitao.yao.final


Re: How to force the script finish the job and continue the follow script?

Posted by Adam Kawa <ka...@gmail.com>.
Probably because you process less than 1GB of input data?

If you did not use "set default parallel" or the PARALLEL clause in
your script, Pig (since 0.8) will automatically calculate the number
of reducers based on a heuristic that allocates a reducer for every
1GB of input data (input data, not intermediate map outputs). You may
want to read about parallelization of the reduce phase in Apache Pig
at http://pig.apache.org/docs/r0.10.0/perf.html#parallel.

Please note that this heuristic is rather simple and acts as a safety
net in case you forget to specify right number of reducers using "set
default parallel" or the PARALLEL clause. Since number of reducers
must be calculated before job (and thus mappers) start and Pig can not
predict how much data will be generated by your mappers, this
heuristic assumes that size of data does not change (i.e. reducers get
the same amounts of data as mappers). You may experience that there
are operators that are not good fit for this heuristic e.g. CROSS A, B
- since it multiplies the size of the input data (in such case, the
calculated number of reducers probably will be too small).

Best,
Adam

2012/9/17 Haitao Yao <ya...@gmail.com>:
> I just wanna to spilt the big job into smaller ones but still write into one script file.
>
> I still does not know why the first one of the compiled job have only 1 reducer.
>
> Thanks.
>
> Haitao Yao
> yao.erix@gmail.com
> weibo: @haitao_yao
> Skype:  haitao.yao.final
>
> On 2012-9-17, at 上午12:35, Alan Gates wrote:
>
>> 'exec' will force your job to start.  However, I strongly doubt this will solve your OOME problem, as some one part of your job is running out of memory.  Whichever part that is will still fail I suspect.  Pig jobs don't generally accrue memory as they go since most memory intensive operations are done in their own task.  If you can isolate the part of your script that is causing an OOME (which exec should help with) and send that portion to the list we may be able to help figure out what's causing the issue.
>>
>> Alan.
>>
>> On Sep 15, 2012, at 10:52 PM, Haitao Yao wrote:
>>
>>> Hi, all
>>>      I forgot the keyword which force Pig to finish the job and then continue the following script.
>>>      My job failed because of OOME, so I want to split the jobs into smaller ones but still written in a single pig script(because the script is generated) .
>>>      Is there any keywords that can achieve this?
>>>      thanks.
>>>
>>>
>>>
>>> Haitao Yao
>>> yao.erix@gmail.com
>>> weibo: @haitao_yao
>>> Skype:  haitao.yao.final
>>>
>>
>

Re: How to force the script finish the job and continue the follow script?

Posted by Haitao Yao <ya...@gmail.com>.
I just wanna to spilt the big job into smaller ones but still write into one script file. 

I still does not know why the first one of the compiled job have only 1 reducer. 

Thanks.

Haitao Yao
yao.erix@gmail.com
weibo: @haitao_yao
Skype:  haitao.yao.final

On 2012-9-17, at 上午12:35, Alan Gates wrote:

> 'exec' will force your job to start.  However, I strongly doubt this will solve your OOME problem, as some one part of your job is running out of memory.  Whichever part that is will still fail I suspect.  Pig jobs don't generally accrue memory as they go since most memory intensive operations are done in their own task.  If you can isolate the part of your script that is causing an OOME (which exec should help with) and send that portion to the list we may be able to help figure out what's causing the issue.
> 
> Alan.
> 
> On Sep 15, 2012, at 10:52 PM, Haitao Yao wrote:
> 
>> Hi, all
>> 	I forgot the keyword which force Pig to finish the job and then continue the following script.
>> 	My job failed because of OOME, so I want to split the jobs into smaller ones but still written in a single pig script(because the script is generated) .
>> 	Is there any keywords that can achieve this?
>> 	thanks.
>> 
>> 
>> 
>> Haitao Yao
>> yao.erix@gmail.com
>> weibo: @haitao_yao
>> Skype:  haitao.yao.final
>> 
> 


Re: How to force the script finish the job and continue the follow script?

Posted by Alan Gates <ga...@hortonworks.com>.
'exec' will force your job to start.  However, I strongly doubt this will solve your OOME problem, as some one part of your job is running out of memory.  Whichever part that is will still fail I suspect.  Pig jobs don't generally accrue memory as they go since most memory intensive operations are done in their own task.  If you can isolate the part of your script that is causing an OOME (which exec should help with) and send that portion to the list we may be able to help figure out what's causing the issue.

Alan.

On Sep 15, 2012, at 10:52 PM, Haitao Yao wrote:

> Hi, all
> 	I forgot the keyword which force Pig to finish the job and then continue the following script.
> 	My job failed because of OOME, so I want to split the jobs into smaller ones but still written in a single pig script(because the script is generated) .
> 	Is there any keywords that can achieve this?
> 	thanks.
> 
> 
> 
> Haitao Yao
> yao.erix@gmail.com
> weibo: @haitao_yao
> Skype:  haitao.yao.final
>