You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Virajith Jalaparti <vi...@gmail.com> on 2011/06/23 16:09:26 UTC

"No space left on device" and "Could not find any valid local directory for taskTracker/jobcache/"

Hi,

I am trying to run a sort job (from hadoop-0.20.2-examples.jar) on 50GB of
data (generated using randomwriter). I am using hadoop-0.20.2 on a cluster
of 3 machines with one machine serving as the master and the other two as
slaves.
I get the following errors for various the task attempts:
=======================================================================
11/06/23 07:57:14 INFO mapred.JobClient: Task Id :
attempt_201106230747_0001_m_000119_0, Status : FAILED
Error: java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:282)
        at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:190)
        at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
        at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
        at
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1298)
        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686)
        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173)

Error initializing attempt_201106230747_0001_m_000119_0:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
valid local directory for taskTracker/jobcache/job_201106230747_0001/job.xml
        at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343)
        at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
        at
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:750)
        at
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1664)
        at
org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:97)
        at
org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1629)
=======================================================================

 The dfsadmin -report gives me the following:

==================================================================
Configured Capacity: 465230045184 (433.28 GB)
Present Capacity: 440799092736 (410.53 GB)
DFS Remaining: 371988148224 (346.44 GB)
DFS Used: 68810944512 (64.09 GB)
DFS Used%: 15.61%
Under replicated blocks: 1
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)

Name: 10.1.1.4:50010
Decommission Status : Normal
Configured Capacity: 232615022592 (216.64 GB)
DFS Used: 32243871744 (30.03 GB)
Non DFS Used: 12215377920 (11.38 GB)
DFS Remaining: 188155772928(175.23 GB)
DFS Used%: 13.86%
DFS Remaining%: 80.89%
Last contact: Thu Jun 23 08:04:51 MDT 2011


Name: 10.1.1.3:50010
Decommission Status : Normal
Configured Capacity: 232615022592 (216.64 GB)
DFS Used: 36567072768 (34.06 GB)
Non DFS Used: 12215574528 (11.38 GB)
DFS Remaining: 183832375296(171.21 GB)
DFS Used%: 15.72%
DFS Remaining%: 79.03%
Last contact: Thu Jun 23 08:04:51 MDT 2011

==================================================================



I have the following parameters configured in core-site.xml and
mapred-site.xml

*core-site.xml:*
<property>
  <name>hadoop.tmp.dir</name>
  <value>/mnt/local/mapred/</value>
</property>
</configuration>

*mapred-site.xml:*
    <name>mapred.system.dir</name>
    <value>/mnt/local/mapred/system</value>
  </property>

  <property>
    <name>mapred.local.dir</name>
    <value>/mnt/local/mapred/local</value>
  </property>

  <property>
    <name>mapred.temp.dir</name>
    <value>/mnt/local/mapred/temp</value>
  </property>

/mnt/ is on a local disk at each node in my cluster and it is just 17% full
with a total disk capacity of around 220GB. Each of the above directories
are created with read/write permissions.


I dont see why I am getting the "No space left on device" error from these
configurations. Any ideas how to solve this problem?

Thanks,
Virajith

Re: "No space left on device" and "Could not find any valid local directory for taskTracker/jobcache/"

Posted by Virajith Jalaparti <vi...@gmail.com>.
In case it is required, I was trying to run this using 400mappers (my DFS
block size is 128MB) and 4 reducers. Each of my machines is a 2.4 GHz 64-bit
Quad Core Xeon E5530 "Nehalem" processor and I am using a 32-bit Ubuntu
10.4.

-Virajith

On Thu, Jun 23, 2011 at 3:09 PM, Virajith Jalaparti <vi...@gmail.com>wrote:

> Hi,
>
> I am trying to run a sort job (from hadoop-0.20.2-examples.jar) on 50GB of
> data (generated using randomwriter). I am using hadoop-0.20.2 on a cluster
> of 3 machines with one machine serving as the master and the other two as
> slaves.
> I get the following errors for various the task attempts:
> =======================================================================
> 11/06/23 07:57:14 INFO mapred.JobClient: Task Id :
> attempt_201106230747_0001_m_000119_0, Status : FAILED
> Error: java.io.IOException: No space left on device
>         at java.io.FileOutputStream.writeBytes(Native Method)
>         at java.io.FileOutputStream.write(FileOutputStream.java:282)
>         at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:190)
>         at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>         at
> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
>         at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
>         at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
>         at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
>         at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1298)
>         at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686)
>         at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173)
>
> Error initializing attempt_201106230747_0001_m_000119_0:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
> valid local directory for taskTracker/jobcache/job_201106230747_0001/job.xml
>         at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343)
>         at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
>         at
> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:750)
>         at
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1664)
>         at
> org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:97)
>         at
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1629)
> =======================================================================
>
>  The dfsadmin -report gives me the following:
>
> ==================================================================
> Configured Capacity: 465230045184 (433.28 GB)
> Present Capacity: 440799092736 (410.53 GB)
> DFS Remaining: 371988148224 (346.44 GB)
> DFS Used: 68810944512 (64.09 GB)
> DFS Used%: 15.61%
> Under replicated blocks: 1
> Blocks with corrupt replicas: 0
> Missing blocks: 0
>
> -------------------------------------------------
> Datanodes available: 2 (2 total, 0 dead)
>
> Name: 10.1.1.4:50010
> Decommission Status : Normal
> Configured Capacity: 232615022592 (216.64 GB)
> DFS Used: 32243871744 (30.03 GB)
> Non DFS Used: 12215377920 (11.38 GB)
> DFS Remaining: 188155772928(175.23 GB)
> DFS Used%: 13.86%
> DFS Remaining%: 80.89%
> Last contact: Thu Jun 23 08:04:51 MDT 2011
>
>
> Name: 10.1.1.3:50010
> Decommission Status : Normal
> Configured Capacity: 232615022592 (216.64 GB)
> DFS Used: 36567072768 (34.06 GB)
> Non DFS Used: 12215574528 (11.38 GB)
> DFS Remaining: 183832375296(171.21 GB)
> DFS Used%: 15.72%
> DFS Remaining%: 79.03%
> Last contact: Thu Jun 23 08:04:51 MDT 2011
>
> ==================================================================
>
>
>
> I have the following parameters configured in core-site.xml and
> mapred-site.xml
>
> *core-site.xml:*
> <property>
>   <name>hadoop.tmp.dir</name>
>   <value>/mnt/local/mapred/</value>
> </property>
> </configuration>
>
> *mapred-site.xml:*
>     <name>mapred.system.dir</name>
>     <value>/mnt/local/mapred/system</value>
>   </property>
>
>   <property>
>     <name>mapred.local.dir</name>
>     <value>/mnt/local/mapred/local</value>
>   </property>
>
>   <property>
>     <name>mapred.temp.dir</name>
>     <value>/mnt/local/mapred/temp</value>
>   </property>
>
> /mnt/ is on a local disk at each node in my cluster and it is just 17% full
> with a total disk capacity of around 220GB. Each of the above directories
> are created with read/write permissions.
>
>
> I dont see why I am getting the "No space left on device" error from these
> configurations. Any ideas how to solve this problem?
>
> Thanks,
> Virajith
>
>

Re: "No space left on device" and "Could not find any valid local directory for taskTracker/jobcache/"

Posted by Allen Wittenauer <aw...@apache.org>.
On Jun 23, 2011, at 7:09 AM, Virajith Jalaparti wrote:

> Hi,
> 
> I am trying to run a sort job (from hadoop-0.20.2-examples.jar) on 50GB of
> data (generated using randomwriter). I am using hadoop-0.20.2 on a cluster
> of 3 machines with one machine serving as the master and the other two as
> slaves.
> I get the following errors for various the task attempts:
> 
> Error: java.io.IOException: No space left on device
>        at java.io.FileOutputStream.writeBytes(Native Method)
>        at java.io.FileOutputStream.write(FileOutputStream.java:282)
>        at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:190)

	This usually means you're out of temp mapred space.

> 
> /mnt/ is on a local disk at each node in my cluster and it is just 17% full
> with a total disk capacity of around 220GB. Each of the above directories
> are created with read/write permissions.

	Watch how much space you have while the jobs are running.