You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Jim Twensky <ji...@gmail.com> on 2009/04/08 04:22:11 UTC

getting DiskErrorException during map

Hi,

I'm using Hadoop 0.19.1 and I have a very small test cluster with 9 nodes, 8
of them being task trackers. I'm getting the following error and my jobs
keep failing when map processes start hitting 30%:

org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
valid local directory for
taskTracker/jobcache/job_200904072051_0001/attempt_200904072051_0001_m_000000_1/output/file.out
        at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
        at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
        at
org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61)
        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1209)
        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:867)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.Child.main(Child.java:158)


I googled many blogs and web pages but I could neither understand why this
happens nor found a solution to this. What does that error message mean and
how can avoid it, any suggestions?

Thanks in advance,
-jim

Re: getting DiskErrorException during map

Posted by jason hadoop <ja...@gmail.com>.

That is wild, I wonder why I am so lucky. I have had that problem
consistently across multiple machines and locations, to the point I don't
even try any more, I just symlink things.
It must be some id10t error on my part.

On Tue, Apr 21, 2009 at 8:07 PM, Brian Bockelman <bb...@cse.unl.edu>wrote:

> Hey Jason,
>
> We've never had the hadoop.tmp.dir identical on all our nodes.
>
> Brian
>
>
> On Apr 22, 2009, at 10:54 AM, jason hadoop wrote:
>
>  For reasons that I have never bothered to investigate I have never had a
>> cluster work when the hadoop.tmp.dir was not identical on all of the
>> nodes.
>>
>> My solution has always been to just make a symbolic link so that
>> hadoop.tmp.dir was identical and on the machine in question really ended
>> up
>> in the file system/directory tree that I needed the data to appear in.
>>
>> Since this just works and takes a few seconds to setup, I have my reason
>> why
>> I never bothered to try to figure out why per machine configuration of the
>> hadoop.tmp.dir variable doesn't seem to work for me - from 15.1 -> 19.0.
>>
>>
>> On Tue, Apr 21, 2009 at 8:36 AM, Steve Loughran <st...@apache.org>
>> wrote:
>>
>>  Jim Twensky wrote:
>>>
>>>  Yes, here is how it looks:
>>>>
>>>>  <property>
>>>>      <name>hadoop.tmp.dir</name>
>>>>      <value>/scratch/local/jim/hadoop-${user.name}</value>
>>>>  </property>
>>>>
>>>> so I don't know why it still writes to /tmp. As a temporary workaround,
>>>> I
>>>> created a symbolic link from /tmp/hadoop-jim to /scratch/...
>>>> and it works fine now but if you think this might be a considered as a
>>>> bug,
>>>> I can report it.
>>>>
>>>>
>>> I've encountered this somewhere too; could be something is using the java
>>> temp file API, which is not what you want. Try setting java.io.tmpdir to
>>> /scratch/local/tmp just to see if that makes it go away
>>>
>>>
>>>
>>>
>>
>> --
>> Alpha Chapters of my book on Hadoop are available
>> http://www.apress.com/book/view/9781430219422
>>
>
>


-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Re: getting DiskErrorException during map

Posted by Brian Bockelman <bb...@cse.unl.edu>.

Hey Jason,

We've never had the hadoop.tmp.dir identical on all our nodes.

Brian

On Apr 22, 2009, at 10:54 AM, jason hadoop wrote:

> For reasons that I have never bothered to investigate I have never  
> had a
> cluster work when the hadoop.tmp.dir was not identical on all of the  
> nodes.
>
> My solution has always been to just make a symbolic link so that
> hadoop.tmp.dir was identical and on the machine in question really  
> ended up
> in the file system/directory tree that I needed the data to appear in.
>
> Since this just works and takes a few seconds to setup, I have my  
> reason why
> I never bothered to try to figure out why per machine configuration  
> of the
> hadoop.tmp.dir variable doesn't seem to work for me - from 15.1 ->  
> 19.0.
>
>
> On Tue, Apr 21, 2009 at 8:36 AM, Steve Loughran <st...@apache.org>  
> wrote:
>
>> Jim Twensky wrote:
>>
>>> Yes, here is how it looks:
>>>
>>>   <property>
>>>       <name>hadoop.tmp.dir</name>
>>>       <value>/scratch/local/jim/hadoop-${user.name}</value>
>>>   </property>
>>>
>>> so I don't know why it still writes to /tmp. As a temporary  
>>> workaround, I
>>> created a symbolic link from /tmp/hadoop-jim to /scratch/...
>>> and it works fine now but if you think this might be a considered  
>>> as a
>>> bug,
>>> I can report it.
>>>
>>
>> I've encountered this somewhere too; could be something is using  
>> the java
>> temp file API, which is not what you want. Try setting  
>> java.io.tmpdir to
>> /scratch/local/tmp just to see if that makes it go away
>>
>>
>>
>
>
> -- 
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422

Re: getting DiskErrorException during map

Posted by jason hadoop <ja...@gmail.com>.

For reasons that I have never bothered to investigate I have never had a
cluster work when the hadoop.tmp.dir was not identical on all of the nodes.

My solution has always been to just make a symbolic link so that
hadoop.tmp.dir was identical and on the machine in question really ended up
in the file system/directory tree that I needed the data to appear in.

Since this just works and takes a few seconds to setup, I have my reason why
I never bothered to try to figure out why per machine configuration of the
hadoop.tmp.dir variable doesn't seem to work for me - from 15.1 -> 19.0.

On Tue, Apr 21, 2009 at 8:36 AM, Steve Loughran <st...@apache.org> wrote:

> Jim Twensky wrote:
>
>> Yes, here is how it looks:
>>
>>    <property>
>>        <name>hadoop.tmp.dir</name>
>>        <value>/scratch/local/jim/hadoop-${user.name}</value>
>>    </property>
>>
>> so I don't know why it still writes to /tmp. As a temporary workaround, I
>> created a symbolic link from /tmp/hadoop-jim to /scratch/...
>> and it works fine now but if you think this might be a considered as a
>> bug,
>> I can report it.
>>
>
> I've encountered this somewhere too; could be something is using the java
> temp file API, which is not what you want. Try setting java.io.tmpdir to
> /scratch/local/tmp just to see if that makes it go away
>
>
>

-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Re: getting DiskErrorException during map

Posted by Steve Loughran <st...@apache.org>.

Jim Twensky wrote:
> Yes, here is how it looks:
> 
>     <property>
>         <name>hadoop.tmp.dir</name>
>         <value>/scratch/local/jim/hadoop-${user.name}</value>
>     </property>
> 
> so I don't know why it still writes to /tmp. As a temporary workaround, I
> created a symbolic link from /tmp/hadoop-jim to /scratch/...
> and it works fine now but if you think this might be a considered as a bug,
> I can report it.

I've encountered this somewhere too; could be something is using the 
java temp file API, which is not what you want. Try setting 
java.io.tmpdir to /scratch/local/tmp just to see if that makes it go away

Re: getting DiskErrorException during map

Posted by Jim Twensky <ji...@gmail.com>.

Yes, here is how it looks:

    <property>
        <name>hadoop.tmp.dir</name>
        <value>/scratch/local/jim/hadoop-${user.name}</value>
    </property>

so I don't know why it still writes to /tmp. As a temporary workaround, I
created a symbolic link from /tmp/hadoop-jim to /scratch/...
and it works fine now but if you think this might be a considered as a bug,
I can report it.

Thanks,
Jim


On Thu, Apr 16, 2009 at 12:44 PM, Alex Loddengaard <al...@cloudera.com>wrote:

> Have you set hadoop.tmp.dir away from /tmp as well?  If hadoop.tmp.dir is
> set somewhere in /scratch vs. /tmp, then I'm not sure why Hadoop would be
> writing to /tmp.
>
> Hope this helps!
>
> Alex
>
> On Wed, Apr 15, 2009 at 2:37 PM, Jim Twensky <ji...@gmail.com>
> wrote:
>
> > Alex,
> >
> > Yes, I bounced the Hadoop daemons after I changed the configuration
> files.
> >
> > I also tried setting  $HADOOP_CONF_DIR to the directory where my
> > hadop-site.xml file resides but it didn't work.
> > However, I'm sure that HADOOP_CONF_DIR is not the issue because other
> > properties that I changed in hadoop-site.xml
> > seem to be properly set. Also, here is a section from my hadoop-site.xml
> > file:
> >
> >    <property>
> >        <name>hadoop.tmp.dir</name>
> >         <value>/scratch/local/jim/hadoop-${user.name}</value>
> >     </property>
> >    <property>
> >        <name>mapred.local.dir</name>
> >         <value>/scratch/local/jim/hadoop-${user.name
> }/mapred/local</value>
> >    </property>
> >
> > I also created /scratch/local/jim/hadoop-jim/mapred/local on each task
> > tracker since I know
> > directories that do not exist are ignored.
> >
> > When I manually ssh to the task trackers, I can see the directory
> > /scratch/local/jim/hadoop-jim/dfs
> > is automatically created so is it seems like  hadoop.tmp.dir is set
> > properly. However, hadoop still creates
> > /tmp/hadoop-jim/mapred/local and uses that directory for the local
> storage.
> >
> > I'm starting to suspect that mapred.local.dir is overwritten to a default
> > value of /tmp/hadoop-${user.name}
> > somewhere inside the binaries.
> >
> > -jim
> >
> > On Tue, Apr 14, 2009 at 4:07 PM, Alex Loddengaard <al...@cloudera.com>
> > wrote:
> >
> > > First, did you bounce the Hadoop daemons after you changed the
> > > configuration
> > > files?  I think you'll have to do this.
> > >
> > > Second, I believe 0.19.1 has hadoop-default.xml baked into the jar.
>  Try
> > > setting $HADOOP_CONF_DIR to the directory where hadoop-site.xml lives.
> >  For
> > > whatever reason your hadoop-site.xml (and the hadoop-default.xml you
> > tried
> > > to change) are probably not being loaded.  $HADOOP_CONF_DIR should fix
> > > this.
> > >
> > > Good luck!
> > >
> > > Alex
> > >
> > > On Mon, Apr 13, 2009 at 11:25 AM, Jim Twensky <ji...@gmail.com>
> > > wrote:
> > >
> > > > Thank you Alex, you are right. There are quotas on the systems that
> I'm
> > > > working. However, I tried to change mapred.local.dir as follows:
> > > >
> > > > --inside hadoop-site.xml:
> > > >
> > > >    <property>
> > > >        <name>mapred.child.tmp</name>
> > > >        <value>/scratch/local/jim</value>
> > > >    </property>
> > > >    <property>
> > > >        <name>hadoop.tmp.dir</name>
> > > >        <value>/scratch/local/jim</value>
> > > >    </property>
> > > >    <property>
> > > >        <name>mapred.local.dir</name>
> > > >        <value>/scratch/local/jim</value>
> > > >    </property>
> > > >
> > > >  and observed that the intermediate map outputs are still being
> written
> > > > under /tmp/hadoop-jim/mapred/local
> > > >
> > > > I'm confused at this point since I also tried setting these values
> > > directly
> > > > inside the hadoop-default.xml and that didn't work either. Is there
> any
> > > > other property that I'm supposed to change? I tried searching for
> > "/tmp"
> > > in
> > > > the hadoop-default.xml file but couldn't find anything else.
> > > >
> > > > Thanks,
> > > > Jim
> > > >
> > > >
> > > > On Tue, Apr 7, 2009 at 9:35 PM, Alex Loddengaard <al...@cloudera.com>
> > > > wrote:
> > > >
> > > > > The getLocalPathForWrite function that throws this Exception
> assumes
> > > that
> > > > > you have space on the disks that mapred.local.dir is configured on.
> > >  Can
> > > > > you
> > > > > verify with `df` that those disks have space available?  You might
> > also
> > > > try
> > > > > moving mapred.local.dir off of /tmp if it's configured to use /tmp
> > > right
> > > > > now; I believe some systems have quotas on /tmp.
> > > > >
> > > > > Hope this helps.
> > > > >
> > > > > Alex
> > > > >
> > > > > On Tue, Apr 7, 2009 at 7:22 PM, Jim Twensky <jim.twensky@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I'm using Hadoop 0.19.1 and I have a very small test cluster with
> 9
> > > > > nodes,
> > > > > > 8
> > > > > > of them being task trackers. I'm getting the following error and
> my
> > > > jobs
> > > > > > keep failing when map processes start hitting 30%:
> > > > > >
> > > > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not
> > find
> > > > any
> > > > > > valid local directory for
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> taskTracker/jobcache/job_200904072051_0001/attempt_200904072051_0001_m_000000_1/output/file.out
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1209)
> > > > > >        at
> > > > > >
> > > >
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:867)
> > > > > >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> > > > > >        at org.apache.hadoop.mapred.Child.main(Child.java:158)
> > > > > >
> > > > > >
> > > > > > I googled many blogs and web pages but I could neither understand
> > why
> > > > > this
> > > > > > happens nor found a solution to this. What does that error
> message
> > > mean
> > > > > and
> > > > > > how can avoid it, any suggestions?
> > > > > >
> > > > > > Thanks in advance,
> > > > > > -jim
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: getting DiskErrorException during map

Posted by Alex Loddengaard <al...@cloudera.com>.

Have you set hadoop.tmp.dir away from /tmp as well?  If hadoop.tmp.dir is
set somewhere in /scratch vs. /tmp, then I'm not sure why Hadoop would be
writing to /tmp.

Hope this helps!

Alex

On Wed, Apr 15, 2009 at 2:37 PM, Jim Twensky <ji...@gmail.com> wrote:

> Alex,
>
> Yes, I bounced the Hadoop daemons after I changed the configuration files.
>
> I also tried setting  $HADOOP_CONF_DIR to the directory where my
> hadop-site.xml file resides but it didn't work.
> However, I'm sure that HADOOP_CONF_DIR is not the issue because other
> properties that I changed in hadoop-site.xml
> seem to be properly set. Also, here is a section from my hadoop-site.xml
> file:
>
>    <property>
>        <name>hadoop.tmp.dir</name>
>         <value>/scratch/local/jim/hadoop-${user.name}</value>
>     </property>
>    <property>
>        <name>mapred.local.dir</name>
>         <value>/scratch/local/jim/hadoop-${user.name}/mapred/local</value>
>    </property>
>
> I also created /scratch/local/jim/hadoop-jim/mapred/local on each task
> tracker since I know
> directories that do not exist are ignored.
>
> When I manually ssh to the task trackers, I can see the directory
> /scratch/local/jim/hadoop-jim/dfs
> is automatically created so is it seems like  hadoop.tmp.dir is set
> properly. However, hadoop still creates
> /tmp/hadoop-jim/mapred/local and uses that directory for the local storage.
>
> I'm starting to suspect that mapred.local.dir is overwritten to a default
> value of /tmp/hadoop-${user.name}
> somewhere inside the binaries.
>
> -jim
>
> On Tue, Apr 14, 2009 at 4:07 PM, Alex Loddengaard <al...@cloudera.com>
> wrote:
>
> > First, did you bounce the Hadoop daemons after you changed the
> > configuration
> > files?  I think you'll have to do this.
> >
> > Second, I believe 0.19.1 has hadoop-default.xml baked into the jar.  Try
> > setting $HADOOP_CONF_DIR to the directory where hadoop-site.xml lives.
>  For
> > whatever reason your hadoop-site.xml (and the hadoop-default.xml you
> tried
> > to change) are probably not being loaded.  $HADOOP_CONF_DIR should fix
> > this.
> >
> > Good luck!
> >
> > Alex
> >
> > On Mon, Apr 13, 2009 at 11:25 AM, Jim Twensky <ji...@gmail.com>
> > wrote:
> >
> > > Thank you Alex, you are right. There are quotas on the systems that I'm
> > > working. However, I tried to change mapred.local.dir as follows:
> > >
> > > --inside hadoop-site.xml:
> > >
> > >    <property>
> > >        <name>mapred.child.tmp</name>
> > >        <value>/scratch/local/jim</value>
> > >    </property>
> > >    <property>
> > >        <name>hadoop.tmp.dir</name>
> > >        <value>/scratch/local/jim</value>
> > >    </property>
> > >    <property>
> > >        <name>mapred.local.dir</name>
> > >        <value>/scratch/local/jim</value>
> > >    </property>
> > >
> > >  and observed that the intermediate map outputs are still being written
> > > under /tmp/hadoop-jim/mapred/local
> > >
> > > I'm confused at this point since I also tried setting these values
> > directly
> > > inside the hadoop-default.xml and that didn't work either. Is there any
> > > other property that I'm supposed to change? I tried searching for
> "/tmp"
> > in
> > > the hadoop-default.xml file but couldn't find anything else.
> > >
> > > Thanks,
> > > Jim
> > >
> > >
> > > On Tue, Apr 7, 2009 at 9:35 PM, Alex Loddengaard <al...@cloudera.com>
> > > wrote:
> > >
> > > > The getLocalPathForWrite function that throws this Exception assumes
> > that
> > > > you have space on the disks that mapred.local.dir is configured on.
> >  Can
> > > > you
> > > > verify with `df` that those disks have space available?  You might
> also
> > > try
> > > > moving mapred.local.dir off of /tmp if it's configured to use /tmp
> > right
> > > > now; I believe some systems have quotas on /tmp.
> > > >
> > > > Hope this helps.
> > > >
> > > > Alex
> > > >
> > > > On Tue, Apr 7, 2009 at 7:22 PM, Jim Twensky <ji...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm using Hadoop 0.19.1 and I have a very small test cluster with 9
> > > > nodes,
> > > > > 8
> > > > > of them being task trackers. I'm getting the following error and my
> > > jobs
> > > > > keep failing when map processes start hitting 30%:
> > > > >
> > > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not
> find
> > > any
> > > > > valid local directory for
> > > > >
> > > > >
> > > >
> > >
> >
> taskTracker/jobcache/job_200904072051_0001/attempt_200904072051_0001_m_000000_1/output/file.out
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1209)
> > > > >        at
> > > > >
> > >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:867)
> > > > >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> > > > >        at org.apache.hadoop.mapred.Child.main(Child.java:158)
> > > > >
> > > > >
> > > > > I googled many blogs and web pages but I could neither understand
> why
> > > > this
> > > > > happens nor found a solution to this. What does that error message
> > mean
> > > > and
> > > > > how can avoid it, any suggestions?
> > > > >
> > > > > Thanks in advance,
> > > > > -jim
> > > > >
> > > >
> > >
> >
>

Re: getting DiskErrorException during map

Posted by Jim Twensky <ji...@gmail.com>.

Alex,

Yes, I bounced the Hadoop daemons after I changed the configuration files.

I also tried setting  $HADOOP_CONF_DIR to the directory where my
hadop-site.xml file resides but it didn't work.
However, I'm sure that HADOOP_CONF_DIR is not the issue because other
properties that I changed in hadoop-site.xml
seem to be properly set. Also, here is a section from my hadoop-site.xml
file:

    <property>
        <name>hadoop.tmp.dir</name>
        <value>/scratch/local/jim/hadoop-${user.name}</value>
    </property>
    <property>
        <name>mapred.local.dir</name>
        <value>/scratch/local/jim/hadoop-${user.name}/mapred/local</value>
    </property>

I also created /scratch/local/jim/hadoop-jim/mapred/local on each task
tracker since I know
directories that do not exist are ignored.

When I manually ssh to the task trackers, I can see the directory
/scratch/local/jim/hadoop-jim/dfs
is automatically created so is it seems like  hadoop.tmp.dir is set
properly. However, hadoop still creates
/tmp/hadoop-jim/mapred/local and uses that directory for the local storage.

I'm starting to suspect that mapred.local.dir is overwritten to a default
value of /tmp/hadoop-${user.name}
somewhere inside the binaries.

-jim

On Tue, Apr 14, 2009 at 4:07 PM, Alex Loddengaard <al...@cloudera.com> wrote:

> First, did you bounce the Hadoop daemons after you changed the
> configuration
> files?  I think you'll have to do this.
>
> Second, I believe 0.19.1 has hadoop-default.xml baked into the jar.  Try
> setting $HADOOP_CONF_DIR to the directory where hadoop-site.xml lives.  For
> whatever reason your hadoop-site.xml (and the hadoop-default.xml you tried
> to change) are probably not being loaded.  $HADOOP_CONF_DIR should fix
> this.
>
> Good luck!
>
> Alex
>
> On Mon, Apr 13, 2009 at 11:25 AM, Jim Twensky <ji...@gmail.com>
> wrote:
>
> > Thank you Alex, you are right. There are quotas on the systems that I'm
> > working. However, I tried to change mapred.local.dir as follows:
> >
> > --inside hadoop-site.xml:
> >
> >    <property>
> >        <name>mapred.child.tmp</name>
> >        <value>/scratch/local/jim</value>
> >    </property>
> >    <property>
> >        <name>hadoop.tmp.dir</name>
> >        <value>/scratch/local/jim</value>
> >    </property>
> >    <property>
> >        <name>mapred.local.dir</name>
> >        <value>/scratch/local/jim</value>
> >    </property>
> >
> >  and observed that the intermediate map outputs are still being written
> > under /tmp/hadoop-jim/mapred/local
> >
> > I'm confused at this point since I also tried setting these values
> directly
> > inside the hadoop-default.xml and that didn't work either. Is there any
> > other property that I'm supposed to change? I tried searching for "/tmp"
> in
> > the hadoop-default.xml file but couldn't find anything else.
> >
> > Thanks,
> > Jim
> >
> >
> > On Tue, Apr 7, 2009 at 9:35 PM, Alex Loddengaard <al...@cloudera.com>
> > wrote:
> >
> > > The getLocalPathForWrite function that throws this Exception assumes
> that
> > > you have space on the disks that mapred.local.dir is configured on.
>  Can
> > > you
> > > verify with `df` that those disks have space available?  You might also
> > try
> > > moving mapred.local.dir off of /tmp if it's configured to use /tmp
> right
> > > now; I believe some systems have quotas on /tmp.
> > >
> > > Hope this helps.
> > >
> > > Alex
> > >
> > > On Tue, Apr 7, 2009 at 7:22 PM, Jim Twensky <ji...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm using Hadoop 0.19.1 and I have a very small test cluster with 9
> > > nodes,
> > > > 8
> > > > of them being task trackers. I'm getting the following error and my
> > jobs
> > > > keep failing when map processes start hitting 30%:
> > > >
> > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > any
> > > > valid local directory for
> > > >
> > > >
> > >
> >
> taskTracker/jobcache/job_200904072051_0001/attempt_200904072051_0001_m_000000_1/output/file.out
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1209)
> > > >        at
> > > >
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:867)
> > > >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> > > >        at org.apache.hadoop.mapred.Child.main(Child.java:158)
> > > >
> > > >
> > > > I googled many blogs and web pages but I could neither understand why
> > > this
> > > > happens nor found a solution to this. What does that error message
> mean
> > > and
> > > > how can avoid it, any suggestions?
> > > >
> > > > Thanks in advance,
> > > > -jim
> > > >
> > >
> >
>

Re: getting DiskErrorException during map

Posted by Alex Loddengaard <al...@cloudera.com>.

First, did you bounce the Hadoop daemons after you changed the configuration
files?  I think you'll have to do this.

Second, I believe 0.19.1 has hadoop-default.xml baked into the jar.  Try
setting $HADOOP_CONF_DIR to the directory where hadoop-site.xml lives.  For
whatever reason your hadoop-site.xml (and the hadoop-default.xml you tried
to change) are probably not being loaded.  $HADOOP_CONF_DIR should fix this.

Good luck!

Alex

On Mon, Apr 13, 2009 at 11:25 AM, Jim Twensky <ji...@gmail.com> wrote:

> Thank you Alex, you are right. There are quotas on the systems that I'm
> working. However, I tried to change mapred.local.dir as follows:
>
> --inside hadoop-site.xml:
>
>    <property>
>        <name>mapred.child.tmp</name>
>        <value>/scratch/local/jim</value>
>    </property>
>    <property>
>        <name>hadoop.tmp.dir</name>
>        <value>/scratch/local/jim</value>
>    </property>
>    <property>
>        <name>mapred.local.dir</name>
>        <value>/scratch/local/jim</value>
>    </property>
>
>  and observed that the intermediate map outputs are still being written
> under /tmp/hadoop-jim/mapred/local
>
> I'm confused at this point since I also tried setting these values directly
> inside the hadoop-default.xml and that didn't work either. Is there any
> other property that I'm supposed to change? I tried searching for "/tmp" in
> the hadoop-default.xml file but couldn't find anything else.
>
> Thanks,
> Jim
>
>
> On Tue, Apr 7, 2009 at 9:35 PM, Alex Loddengaard <al...@cloudera.com>
> wrote:
>
> > The getLocalPathForWrite function that throws this Exception assumes that
> > you have space on the disks that mapred.local.dir is configured on.  Can
> > you
> > verify with `df` that those disks have space available?  You might also
> try
> > moving mapred.local.dir off of /tmp if it's configured to use /tmp right
> > now; I believe some systems have quotas on /tmp.
> >
> > Hope this helps.
> >
> > Alex
> >
> > On Tue, Apr 7, 2009 at 7:22 PM, Jim Twensky <ji...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > I'm using Hadoop 0.19.1 and I have a very small test cluster with 9
> > nodes,
> > > 8
> > > of them being task trackers. I'm getting the following error and my
> jobs
> > > keep failing when map processes start hitting 30%:
> > >
> > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> any
> > > valid local directory for
> > >
> > >
> >
> taskTracker/jobcache/job_200904072051_0001/attempt_200904072051_0001_m_000000_1/output/file.out
> > >        at
> > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
> > >        at
> > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
> > >        at
> > >
> > >
> >
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61)
> > >        at
> > >
> > >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1209)
> > >        at
> > >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:867)
> > >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> > >        at org.apache.hadoop.mapred.Child.main(Child.java:158)
> > >
> > >
> > > I googled many blogs and web pages but I could neither understand why
> > this
> > > happens nor found a solution to this. What does that error message mean
> > and
> > > how can avoid it, any suggestions?
> > >
> > > Thanks in advance,
> > > -jim
> > >
> >
>

Re: getting DiskErrorException during map

Posted by Jim Twensky <ji...@gmail.com>.

Thank you Alex, you are right. There are quotas on the systems that I'm
working. However, I tried to change mapred.local.dir as follows:

--inside hadoop-site.xml:

    <property>
        <name>mapred.child.tmp</name>
        <value>/scratch/local/jim</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/scratch/local/jim</value>
    </property>
    <property>
        <name>mapred.local.dir</name>
        <value>/scratch/local/jim</value>
    </property>

  and observed that the intermediate map outputs are still being written
under /tmp/hadoop-jim/mapred/local

I'm confused at this point since I also tried setting these values directly
inside the hadoop-default.xml and that didn't work either. Is there any
other property that I'm supposed to change? I tried searching for "/tmp" in
the hadoop-default.xml file but couldn't find anything else.

Thanks,
Jim


On Tue, Apr 7, 2009 at 9:35 PM, Alex Loddengaard <al...@cloudera.com> wrote:

> The getLocalPathForWrite function that throws this Exception assumes that
> you have space on the disks that mapred.local.dir is configured on.  Can
> you
> verify with `df` that those disks have space available?  You might also try
> moving mapred.local.dir off of /tmp if it's configured to use /tmp right
> now; I believe some systems have quotas on /tmp.
>
> Hope this helps.
>
> Alex
>
> On Tue, Apr 7, 2009 at 7:22 PM, Jim Twensky <ji...@gmail.com> wrote:
>
> > Hi,
> >
> > I'm using Hadoop 0.19.1 and I have a very small test cluster with 9
> nodes,
> > 8
> > of them being task trackers. I'm getting the following error and my jobs
> > keep failing when map processes start hitting 30%:
> >
> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
> > valid local directory for
> >
> >
> taskTracker/jobcache/job_200904072051_0001/attempt_200904072051_0001_m_000000_1/output/file.out
> >        at
> >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
> >        at
> >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
> >        at
> >
> >
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61)
> >        at
> >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1209)
> >        at
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:867)
> >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> >        at org.apache.hadoop.mapred.Child.main(Child.java:158)
> >
> >
> > I googled many blogs and web pages but I could neither understand why
> this
> > happens nor found a solution to this. What does that error message mean
> and
> > how can avoid it, any suggestions?
> >
> > Thanks in advance,
> > -jim
> >
>

Re: getting DiskErrorException during map

Posted by Alex Loddengaard <al...@cloudera.com>.

The getLocalPathForWrite function that throws this Exception assumes that
you have space on the disks that mapred.local.dir is configured on.  Can you
verify with `df` that those disks have space available?  You might also try
moving mapred.local.dir off of /tmp if it's configured to use /tmp right
now; I believe some systems have quotas on /tmp.

Hope this helps.

Alex

On Tue, Apr 7, 2009 at 7:22 PM, Jim Twensky <ji...@gmail.com> wrote:

> Hi,
>
> I'm using Hadoop 0.19.1 and I have a very small test cluster with 9 nodes,
> 8
> of them being task trackers. I'm getting the following error and my jobs
> keep failing when map processes start hitting 30%:
>
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
> valid local directory for
>
> taskTracker/jobcache/job_200904072051_0001/attempt_200904072051_0001_m_000000_1/output/file.out
>        at
>
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
>        at
>
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
>        at
>
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61)
>        at
>
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1209)
>        at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:867)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>        at org.apache.hadoop.mapred.Child.main(Child.java:158)
>
>
> I googled many blogs and web pages but I could neither understand why this
> happens nor found a solution to this. What does that error message mean and
> how can avoid it, any suggestions?
>
> Thanks in advance,
> -jim
>