You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Steve Lewis <lo...@gmail.com> on 2012/01/18 18:49:59 UTC

Writing large output kills job with timeout _ need ideas

I am running a mapper job which generates a large number of output records
for every input record.
about 32,000,000,000 output records from about 150 mappers - each record
about 200 bytes
The job is failing with timeouts.
When I alter the code to do exactly what it did previously but only output
1 in 100 output records it runs to completion with no
difficulty.
I believe I am saturating some local resource on the mapper but this gets
WAY beyond my knowledge of what is going on internally
Any bright ideas?
-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Re: Writing large output kills job with timeout _ need ideas

Posted by Harsh J <ha...@cloudera.com>.

An earlier reply at http://search-hadoop.com/m/e9dM3rw9IP1 may help
you get over the idle task issue, if you're idle due to processing and
not a real freeze.

On Thu, Jan 26, 2012 at 8:45 PM, Radim Kolar <hs...@sendmail.cz> wrote:
> Any bright ideas?
>
>
> call status update or Progress every 600 seconds or less

-- 
Harsh J
Customer Ops. Engineer, Cloudera

Re: Writing large output kills job with timeout _ need ideas

Posted by Radim Kolar <hs...@sendmail.cz>.

Any bright ideas?


call status update or Progress every 600 seconds or less