You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Steve Lewis <lo...@gmail.com> on 2014/08/21 21:52:00 UTC

Problem scaling working hadoop job

I have a Hadoop job that I have successfully run with an input set of about
50 million input  records. To test scaling and prepare for where we plan to
be a year or two from now I tried the same job with a about 4 times as many
records

most map tasks fail with the message
could not find any valid local directory for
tasktracker/jobcache/job.../jars

The first  job is writing about 4 TB and is running on an 0.23 cluster.

My general understanding is that this message occurs when a tmp directory
on local drive gets full.

I have requested that systems restart the cluster.

My questions are:
1) Are there commands to run on a slave to see the issue?
2) Will restarting the cluster clear things out and help?
3) Are there ways to tune the job to mitigate this issue?