You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jim Twensky <ji...@gmail.com> on 2009/04/08 04:22:11 UTC
getting DiskErrorException during map
Hi,
I'm using Hadoop 0.19.1 and I have a very small test cluster with 9 nodes, 8
of them being task trackers. I'm getting the following error and my jobs
keep failing when map processes start hitting 30%:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
valid local directory for
taskTracker/jobcache/job_200904072051_0001/attempt_200904072051_0001_m_000000_1/output/file.out
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
at
org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1209)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:867)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
I googled many blogs and web pages but I could neither understand why this
happens nor found a solution to this. What does that error message mean and
how can avoid it, any suggestions?
Thanks in advance,
-jim
Re: getting DiskErrorException during map
Posted by jason hadoop <ja...@gmail.com>.
That is wild, I wonder why I am so lucky. I have had that problem
consistently across multiple machines and locations, to the point I don't
even try any more, I just symlink things.
It must be some id10t error on my part.
On Tue, Apr 21, 2009 at 8:07 PM, Brian Bockelman <bb...@cse.unl.edu>wrote:
> Hey Jason,
>
> We've never had the hadoop.tmp.dir identical on all our nodes.
>
> Brian
>
>
> On Apr 22, 2009, at 10:54 AM, jason hadoop wrote:
>
> For reasons that I have never bothered to investigate I have never had a
>> cluster work when the hadoop.tmp.dir was not identical on all of the
>> nodes.
>>
>> My solution has always been to just make a symbolic link so that
>> hadoop.tmp.dir was identical and on the machine in question really ended
>> up
>> in the file system/directory tree that I needed the data to appear in.
>>
>> Since this just works and takes a few seconds to setup, I have my reason
>> why
>> I never bothered to try to figure out why per machine configuration of the
>> hadoop.tmp.dir variable doesn't seem to work for me - from 15.1 -> 19.0.
>>
>>
>> On Tue, Apr 21, 2009 at 8:36 AM, Steve Loughran <st...@apache.org>
>> wrote:
>>
>> Jim Twensky wrote:
>>>
>>> Yes, here is how it looks:
>>>>
>>>> <property>
>>>> <name>hadoop.tmp.dir</name>
>>>> <value>/scratch/local/jim/hadoop-${user.name}</value>
>>>> </property>
>>>>
>>>> so I don't know why it still writes to /tmp. As a temporary workaround,
>>>> I
>>>> created a symbolic link from /tmp/hadoop-jim to /scratch/...
>>>> and it works fine now but if you think this might be a considered as a
>>>> bug,
>>>> I can report it.
>>>>
>>>>
>>> I've encountered this somewhere too; could be something is using the java
>>> temp file API, which is not what you want. Try setting java.io.tmpdir to
>>> /scratch/local/tmp just to see if that makes it go away
>>>
>>>
>>>
>>>
>>
>> --
>> Alpha Chapters of my book on Hadoop are available
>> http://www.apress.com/book/view/9781430219422
>>
>
>
--
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422
Re: getting DiskErrorException during map
Posted by Brian Bockelman <bb...@cse.unl.edu>.
Hey Jason,
We've never had the hadoop.tmp.dir identical on all our nodes.
Brian
On Apr 22, 2009, at 10:54 AM, jason hadoop wrote:
> For reasons that I have never bothered to investigate I have never
> had a
> cluster work when the hadoop.tmp.dir was not identical on all of the
> nodes.
>
> My solution has always been to just make a symbolic link so that
> hadoop.tmp.dir was identical and on the machine in question really
> ended up
> in the file system/directory tree that I needed the data to appear in.
>
> Since this just works and takes a few seconds to setup, I have my
> reason why
> I never bothered to try to figure out why per machine configuration
> of the
> hadoop.tmp.dir variable doesn't seem to work for me - from 15.1 ->
> 19.0.
>
>
> On Tue, Apr 21, 2009 at 8:36 AM, Steve Loughran <st...@apache.org>
> wrote:
>
>> Jim Twensky wrote:
>>
>>> Yes, here is how it looks:
>>>
>>> <property>
>>> <name>hadoop.tmp.dir</name>
>>> <value>/scratch/local/jim/hadoop-${user.name}</value>
>>> </property>
>>>
>>> so I don't know why it still writes to /tmp. As a temporary
>>> workaround, I
>>> created a symbolic link from /tmp/hadoop-jim to /scratch/...
>>> and it works fine now but if you think this might be a considered
>>> as a
>>> bug,
>>> I can report it.
>>>
>>
>> I've encountered this somewhere too; could be something is using
>> the java
>> temp file API, which is not what you want. Try setting
>> java.io.tmpdir to
>> /scratch/local/tmp just to see if that makes it go away
>>
>>
>>
>
>
> --
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422
Re: getting DiskErrorException during map
Posted by jason hadoop <ja...@gmail.com>.
For reasons that I have never bothered to investigate I have never had a
cluster work when the hadoop.tmp.dir was not identical on all of the nodes.
My solution has always been to just make a symbolic link so that
hadoop.tmp.dir was identical and on the machine in question really ended up
in the file system/directory tree that I needed the data to appear in.
Since this just works and takes a few seconds to setup, I have my reason why
I never bothered to try to figure out why per machine configuration of the
hadoop.tmp.dir variable doesn't seem to work for me - from 15.1 -> 19.0.
On Tue, Apr 21, 2009 at 8:36 AM, Steve Loughran <st...@apache.org> wrote:
> Jim Twensky wrote:
>
>> Yes, here is how it looks:
>>
>> <property>
>> <name>hadoop.tmp.dir</name>
>> <value>/scratch/local/jim/hadoop-${user.name}</value>
>> </property>
>>
>> so I don't know why it still writes to /tmp. As a temporary workaround, I
>> created a symbolic link from /tmp/hadoop-jim to /scratch/...
>> and it works fine now but if you think this might be a considered as a
>> bug,
>> I can report it.
>>
>
> I've encountered this somewhere too; could be something is using the java
> temp file API, which is not what you want. Try setting java.io.tmpdir to
> /scratch/local/tmp just to see if that makes it go away
>
>
>
--
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422
Re: getting DiskErrorException during map
Posted by Steve Loughran <st...@apache.org>.
Jim Twensky wrote:
> Yes, here is how it looks:
>
> <property>
> <name>hadoop.tmp.dir</name>
> <value>/scratch/local/jim/hadoop-${user.name}</value>
> </property>
>
> so I don't know why it still writes to /tmp. As a temporary workaround, I
> created a symbolic link from /tmp/hadoop-jim to /scratch/...
> and it works fine now but if you think this might be a considered as a bug,
> I can report it.
I've encountered this somewhere too; could be something is using the
java temp file API, which is not what you want. Try setting
java.io.tmpdir to /scratch/local/tmp just to see if that makes it go away
Re: getting DiskErrorException during map
Posted by Jim Twensky <ji...@gmail.com>.
Yes, here is how it looks:
<property>
<name>hadoop.tmp.dir</name>
<value>/scratch/local/jim/hadoop-${user.name}</value>
</property>
so I don't know why it still writes to /tmp. As a temporary workaround, I
created a symbolic link from /tmp/hadoop-jim to /scratch/...
and it works fine now but if you think this might be a considered as a bug,
I can report it.
Thanks,
Jim
On Thu, Apr 16, 2009 at 12:44 PM, Alex Loddengaard <al...@cloudera.com>wrote:
> Have you set hadoop.tmp.dir away from /tmp as well? If hadoop.tmp.dir is
> set somewhere in /scratch vs. /tmp, then I'm not sure why Hadoop would be
> writing to /tmp.
>
> Hope this helps!
>
> Alex
>
> On Wed, Apr 15, 2009 at 2:37 PM, Jim Twensky <ji...@gmail.com>
> wrote:
>
> > Alex,
> >
> > Yes, I bounced the Hadoop daemons after I changed the configuration
> files.
> >
> > I also tried setting $HADOOP_CONF_DIR to the directory where my
> > hadop-site.xml file resides but it didn't work.
> > However, I'm sure that HADOOP_CONF_DIR is not the issue because other
> > properties that I changed in hadoop-site.xml
> > seem to be properly set. Also, here is a section from my hadoop-site.xml
> > file:
> >
> > <property>
> > <name>hadoop.tmp.dir</name>
> > <value>/scratch/local/jim/hadoop-${user.name}</value>
> > </property>
> > <property>
> > <name>mapred.local.dir</name>
> > <value>/scratch/local/jim/hadoop-${user.name
> }/mapred/local</value>
> > </property>
> >
> > I also created /scratch/local/jim/hadoop-jim/mapred/local on each task
> > tracker since I know
> > directories that do not exist are ignored.
> >
> > When I manually ssh to the task trackers, I can see the directory
> > /scratch/local/jim/hadoop-jim/dfs
> > is automatically created so is it seems like hadoop.tmp.dir is set
> > properly. However, hadoop still creates
> > /tmp/hadoop-jim/mapred/local and uses that directory for the local
> storage.
> >
> > I'm starting to suspect that mapred.local.dir is overwritten to a default
> > value of /tmp/hadoop-${user.name}
> > somewhere inside the binaries.
> >
> > -jim
> >
> > On Tue, Apr 14, 2009 at 4:07 PM, Alex Loddengaard <al...@cloudera.com>
> > wrote:
> >
> > > First, did you bounce the Hadoop daemons after you changed the
> > > configuration
> > > files? I think you'll have to do this.
> > >
> > > Second, I believe 0.19.1 has hadoop-default.xml baked into the jar.
> Try
> > > setting $HADOOP_CONF_DIR to the directory where hadoop-site.xml lives.
> > For
> > > whatever reason your hadoop-site.xml (and the hadoop-default.xml you
> > tried
> > > to change) are probably not being loaded. $HADOOP_CONF_DIR should fix
> > > this.
> > >
> > > Good luck!
> > >
> > > Alex
> > >
> > > On Mon, Apr 13, 2009 at 11:25 AM, Jim Twensky <ji...@gmail.com>
> > > wrote:
> > >
> > > > Thank you Alex, you are right. There are quotas on the systems that
> I'm
> > > > working. However, I tried to change mapred.local.dir as follows:
> > > >
> > > > --inside hadoop-site.xml:
> > > >
> > > > <property>
> > > > <name>mapred.child.tmp</name>
> > > > <value>/scratch/local/jim</value>
> > > > </property>
> > > > <property>
> > > > <name>hadoop.tmp.dir</name>
> > > > <value>/scratch/local/jim</value>
> > > > </property>
> > > > <property>
> > > > <name>mapred.local.dir</name>
> > > > <value>/scratch/local/jim</value>
> > > > </property>
> > > >
> > > > and observed that the intermediate map outputs are still being
> written
> > > > under /tmp/hadoop-jim/mapred/local
> > > >
> > > > I'm confused at this point since I also tried setting these values
> > > directly
> > > > inside the hadoop-default.xml and that didn't work either. Is there
> any
> > > > other property that I'm supposed to change? I tried searching for
> > "/tmp"
> > > in
> > > > the hadoop-default.xml file but couldn't find anything else.
> > > >
> > > > Thanks,
> > > > Jim
> > > >
> > > >
> > > > On Tue, Apr 7, 2009 at 9:35 PM, Alex Loddengaard <al...@cloudera.com>
> > > > wrote:
> > > >
> > > > > The getLocalPathForWrite function that throws this Exception
> assumes
> > > that
> > > > > you have space on the disks that mapred.local.dir is configured on.
> > > Can
> > > > > you
> > > > > verify with `df` that those disks have space available? You might
> > also
> > > > try
> > > > > moving mapred.local.dir off of /tmp if it's configured to use /tmp
> > > right
> > > > > now; I believe some systems have quotas on /tmp.
> > > > >
> > > > > Hope this helps.
> > > > >
> > > > > Alex
> > > > >
> > > > > On Tue, Apr 7, 2009 at 7:22 PM, Jim Twensky <jim.twensky@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I'm using Hadoop 0.19.1 and I have a very small test cluster with
> 9
> > > > > nodes,
> > > > > > 8
> > > > > > of them being task trackers. I'm getting the following error and
> my
> > > > jobs
> > > > > > keep failing when map processes start hitting 30%:
> > > > > >
> > > > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not
> > find
> > > > any
> > > > > > valid local directory for
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> taskTracker/jobcache/job_200904072051_0001/attempt_200904072051_0001_m_000000_1/output/file.out
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1209)
> > > > > > at
> > > > > >
> > > >
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:867)
> > > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> > > > > > at org.apache.hadoop.mapred.Child.main(Child.java:158)
> > > > > >
> > > > > >
> > > > > > I googled many blogs and web pages but I could neither understand
> > why
> > > > > this
> > > > > > happens nor found a solution to this. What does that error
> message
> > > mean
> > > > > and
> > > > > > how can avoid it, any suggestions?
> > > > > >
> > > > > > Thanks in advance,
> > > > > > -jim
> > > > > >
> > > > >
> > > >
> > >
> >
>
Re: getting DiskErrorException during map
Posted by Alex Loddengaard <al...@cloudera.com>.
Have you set hadoop.tmp.dir away from /tmp as well? If hadoop.tmp.dir is
set somewhere in /scratch vs. /tmp, then I'm not sure why Hadoop would be
writing to /tmp.
Hope this helps!
Alex
On Wed, Apr 15, 2009 at 2:37 PM, Jim Twensky <ji...@gmail.com> wrote:
> Alex,
>
> Yes, I bounced the Hadoop daemons after I changed the configuration files.
>
> I also tried setting $HADOOP_CONF_DIR to the directory where my
> hadop-site.xml file resides but it didn't work.
> However, I'm sure that HADOOP_CONF_DIR is not the issue because other
> properties that I changed in hadoop-site.xml
> seem to be properly set. Also, here is a section from my hadoop-site.xml
> file:
>
> <property>
> <name>hadoop.tmp.dir</name>
> <value>/scratch/local/jim/hadoop-${user.name}</value>
> </property>
> <property>
> <name>mapred.local.dir</name>
> <value>/scratch/local/jim/hadoop-${user.name}/mapred/local</value>
> </property>
>
> I also created /scratch/local/jim/hadoop-jim/mapred/local on each task
> tracker since I know
> directories that do not exist are ignored.
>
> When I manually ssh to the task trackers, I can see the directory
> /scratch/local/jim/hadoop-jim/dfs
> is automatically created so is it seems like hadoop.tmp.dir is set
> properly. However, hadoop still creates
> /tmp/hadoop-jim/mapred/local and uses that directory for the local storage.
>
> I'm starting to suspect that mapred.local.dir is overwritten to a default
> value of /tmp/hadoop-${user.name}
> somewhere inside the binaries.
>
> -jim
>
> On Tue, Apr 14, 2009 at 4:07 PM, Alex Loddengaard <al...@cloudera.com>
> wrote:
>
> > First, did you bounce the Hadoop daemons after you changed the
> > configuration
> > files? I think you'll have to do this.
> >
> > Second, I believe 0.19.1 has hadoop-default.xml baked into the jar. Try
> > setting $HADOOP_CONF_DIR to the directory where hadoop-site.xml lives.
> For
> > whatever reason your hadoop-site.xml (and the hadoop-default.xml you
> tried
> > to change) are probably not being loaded. $HADOOP_CONF_DIR should fix
> > this.
> >
> > Good luck!
> >
> > Alex
> >
> > On Mon, Apr 13, 2009 at 11:25 AM, Jim Twensky <ji...@gmail.com>
> > wrote:
> >
> > > Thank you Alex, you are right. There are quotas on the systems that I'm
> > > working. However, I tried to change mapred.local.dir as follows:
> > >
> > > --inside hadoop-site.xml:
> > >
> > > <property>
> > > <name>mapred.child.tmp</name>
> > > <value>/scratch/local/jim</value>
> > > </property>
> > > <property>
> > > <name>hadoop.tmp.dir</name>
> > > <value>/scratch/local/jim</value>
> > > </property>
> > > <property>
> > > <name>mapred.local.dir</name>
> > > <value>/scratch/local/jim</value>
> > > </property>
> > >
> > > and observed that the intermediate map outputs are still being written
> > > under /tmp/hadoop-jim/mapred/local
> > >
> > > I'm confused at this point since I also tried setting these values
> > directly
> > > inside the hadoop-default.xml and that didn't work either. Is there any
> > > other property that I'm supposed to change? I tried searching for
> "/tmp"
> > in
> > > the hadoop-default.xml file but couldn't find anything else.
> > >
> > > Thanks,
> > > Jim
> > >
> > >
> > > On Tue, Apr 7, 2009 at 9:35 PM, Alex Loddengaard <al...@cloudera.com>
> > > wrote:
> > >
> > > > The getLocalPathForWrite function that throws this Exception assumes
> > that
> > > > you have space on the disks that mapred.local.dir is configured on.
> > Can
> > > > you
> > > > verify with `df` that those disks have space available? You might
> also
> > > try
> > > > moving mapred.local.dir off of /tmp if it's configured to use /tmp
> > right
> > > > now; I believe some systems have quotas on /tmp.
> > > >
> > > > Hope this helps.
> > > >
> > > > Alex
> > > >
> > > > On Tue, Apr 7, 2009 at 7:22 PM, Jim Twensky <ji...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm using Hadoop 0.19.1 and I have a very small test cluster with 9
> > > > nodes,
> > > > > 8
> > > > > of them being task trackers. I'm getting the following error and my
> > > jobs
> > > > > keep failing when map processes start hitting 30%:
> > > > >
> > > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not
> find
> > > any
> > > > > valid local directory for
> > > > >
> > > > >
> > > >
> > >
> >
> taskTracker/jobcache/job_200904072051_0001/attempt_200904072051_0001_m_000000_1/output/file.out
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1209)
> > > > > at
> > > > >
> > >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:867)
> > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> > > > > at org.apache.hadoop.mapred.Child.main(Child.java:158)
> > > > >
> > > > >
> > > > > I googled many blogs and web pages but I could neither understand
> why
> > > > this
> > > > > happens nor found a solution to this. What does that error message
> > mean
> > > > and
> > > > > how can avoid it, any suggestions?
> > > > >
> > > > > Thanks in advance,
> > > > > -jim
> > > > >
> > > >
> > >
> >
>
Re: getting DiskErrorException during map
Posted by Jim Twensky <ji...@gmail.com>.
Alex,
Yes, I bounced the Hadoop daemons after I changed the configuration files.
I also tried setting $HADOOP_CONF_DIR to the directory where my
hadop-site.xml file resides but it didn't work.
However, I'm sure that HADOOP_CONF_DIR is not the issue because other
properties that I changed in hadoop-site.xml
seem to be properly set. Also, here is a section from my hadoop-site.xml
file:
<property>
<name>hadoop.tmp.dir</name>
<value>/scratch/local/jim/hadoop-${user.name}</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/scratch/local/jim/hadoop-${user.name}/mapred/local</value>
</property>
I also created /scratch/local/jim/hadoop-jim/mapred/local on each task
tracker since I know
directories that do not exist are ignored.
When I manually ssh to the task trackers, I can see the directory
/scratch/local/jim/hadoop-jim/dfs
is automatically created so is it seems like hadoop.tmp.dir is set
properly. However, hadoop still creates
/tmp/hadoop-jim/mapred/local and uses that directory for the local storage.
I'm starting to suspect that mapred.local.dir is overwritten to a default
value of /tmp/hadoop-${user.name}
somewhere inside the binaries.
-jim
On Tue, Apr 14, 2009 at 4:07 PM, Alex Loddengaard <al...@cloudera.com> wrote:
> First, did you bounce the Hadoop daemons after you changed the
> configuration
> files? I think you'll have to do this.
>
> Second, I believe 0.19.1 has hadoop-default.xml baked into the jar. Try
> setting $HADOOP_CONF_DIR to the directory where hadoop-site.xml lives. For
> whatever reason your hadoop-site.xml (and the hadoop-default.xml you tried
> to change) are probably not being loaded. $HADOOP_CONF_DIR should fix
> this.
>
> Good luck!
>
> Alex
>
> On Mon, Apr 13, 2009 at 11:25 AM, Jim Twensky <ji...@gmail.com>
> wrote:
>
> > Thank you Alex, you are right. There are quotas on the systems that I'm
> > working. However, I tried to change mapred.local.dir as follows:
> >
> > --inside hadoop-site.xml:
> >
> > <property>
> > <name>mapred.child.tmp</name>
> > <value>/scratch/local/jim</value>
> > </property>
> > <property>
> > <name>hadoop.tmp.dir</name>
> > <value>/scratch/local/jim</value>
> > </property>
> > <property>
> > <name>mapred.local.dir</name>
> > <value>/scratch/local/jim</value>
> > </property>
> >
> > and observed that the intermediate map outputs are still being written
> > under /tmp/hadoop-jim/mapred/local
> >
> > I'm confused at this point since I also tried setting these values
> directly
> > inside the hadoop-default.xml and that didn't work either. Is there any
> > other property that I'm supposed to change? I tried searching for "/tmp"
> in
> > the hadoop-default.xml file but couldn't find anything else.
> >
> > Thanks,
> > Jim
> >
> >
> > On Tue, Apr 7, 2009 at 9:35 PM, Alex Loddengaard <al...@cloudera.com>
> > wrote:
> >
> > > The getLocalPathForWrite function that throws this Exception assumes
> that
> > > you have space on the disks that mapred.local.dir is configured on.
> Can
> > > you
> > > verify with `df` that those disks have space available? You might also
> > try
> > > moving mapred.local.dir off of /tmp if it's configured to use /tmp
> right
> > > now; I believe some systems have quotas on /tmp.
> > >
> > > Hope this helps.
> > >
> > > Alex
> > >
> > > On Tue, Apr 7, 2009 at 7:22 PM, Jim Twensky <ji...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm using Hadoop 0.19.1 and I have a very small test cluster with 9
> > > nodes,
> > > > 8
> > > > of them being task trackers. I'm getting the following error and my
> > jobs
> > > > keep failing when map processes start hitting 30%:
> > > >
> > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > any
> > > > valid local directory for
> > > >
> > > >
> > >
> >
> taskTracker/jobcache/job_200904072051_0001/attempt_200904072051_0001_m_000000_1/output/file.out
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1209)
> > > > at
> > > >
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:867)
> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> > > > at org.apache.hadoop.mapred.Child.main(Child.java:158)
> > > >
> > > >
> > > > I googled many blogs and web pages but I could neither understand why
> > > this
> > > > happens nor found a solution to this. What does that error message
> mean
> > > and
> > > > how can avoid it, any suggestions?
> > > >
> > > > Thanks in advance,
> > > > -jim
> > > >
> > >
> >
>
Re: getting DiskErrorException during map
Posted by Alex Loddengaard <al...@cloudera.com>.
First, did you bounce the Hadoop daemons after you changed the configuration
files? I think you'll have to do this.
Second, I believe 0.19.1 has hadoop-default.xml baked into the jar. Try
setting $HADOOP_CONF_DIR to the directory where hadoop-site.xml lives. For
whatever reason your hadoop-site.xml (and the hadoop-default.xml you tried
to change) are probably not being loaded. $HADOOP_CONF_DIR should fix this.
Good luck!
Alex
On Mon, Apr 13, 2009 at 11:25 AM, Jim Twensky <ji...@gmail.com> wrote:
> Thank you Alex, you are right. There are quotas on the systems that I'm
> working. However, I tried to change mapred.local.dir as follows:
>
> --inside hadoop-site.xml:
>
> <property>
> <name>mapred.child.tmp</name>
> <value>/scratch/local/jim</value>
> </property>
> <property>
> <name>hadoop.tmp.dir</name>
> <value>/scratch/local/jim</value>
> </property>
> <property>
> <name>mapred.local.dir</name>
> <value>/scratch/local/jim</value>
> </property>
>
> and observed that the intermediate map outputs are still being written
> under /tmp/hadoop-jim/mapred/local
>
> I'm confused at this point since I also tried setting these values directly
> inside the hadoop-default.xml and that didn't work either. Is there any
> other property that I'm supposed to change? I tried searching for "/tmp" in
> the hadoop-default.xml file but couldn't find anything else.
>
> Thanks,
> Jim
>
>
> On Tue, Apr 7, 2009 at 9:35 PM, Alex Loddengaard <al...@cloudera.com>
> wrote:
>
> > The getLocalPathForWrite function that throws this Exception assumes that
> > you have space on the disks that mapred.local.dir is configured on. Can
> > you
> > verify with `df` that those disks have space available? You might also
> try
> > moving mapred.local.dir off of /tmp if it's configured to use /tmp right
> > now; I believe some systems have quotas on /tmp.
> >
> > Hope this helps.
> >
> > Alex
> >
> > On Tue, Apr 7, 2009 at 7:22 PM, Jim Twensky <ji...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > I'm using Hadoop 0.19.1 and I have a very small test cluster with 9
> > nodes,
> > > 8
> > > of them being task trackers. I'm getting the following error and my
> jobs
> > > keep failing when map processes start hitting 30%:
> > >
> > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> any
> > > valid local directory for
> > >
> > >
> >
> taskTracker/jobcache/job_200904072051_0001/attempt_200904072051_0001_m_000000_1/output/file.out
> > > at
> > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
> > > at
> > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
> > > at
> > >
> > >
> >
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61)
> > > at
> > >
> > >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1209)
> > > at
> > >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:867)
> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> > > at org.apache.hadoop.mapred.Child.main(Child.java:158)
> > >
> > >
> > > I googled many blogs and web pages but I could neither understand why
> > this
> > > happens nor found a solution to this. What does that error message mean
> > and
> > > how can avoid it, any suggestions?
> > >
> > > Thanks in advance,
> > > -jim
> > >
> >
>
Re: getting DiskErrorException during map
Posted by Jim Twensky <ji...@gmail.com>.
Thank you Alex, you are right. There are quotas on the systems that I'm
working. However, I tried to change mapred.local.dir as follows:
--inside hadoop-site.xml:
<property>
<name>mapred.child.tmp</name>
<value>/scratch/local/jim</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/scratch/local/jim</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/scratch/local/jim</value>
</property>
and observed that the intermediate map outputs are still being written
under /tmp/hadoop-jim/mapred/local
I'm confused at this point since I also tried setting these values directly
inside the hadoop-default.xml and that didn't work either. Is there any
other property that I'm supposed to change? I tried searching for "/tmp" in
the hadoop-default.xml file but couldn't find anything else.
Thanks,
Jim
On Tue, Apr 7, 2009 at 9:35 PM, Alex Loddengaard <al...@cloudera.com> wrote:
> The getLocalPathForWrite function that throws this Exception assumes that
> you have space on the disks that mapred.local.dir is configured on. Can
> you
> verify with `df` that those disks have space available? You might also try
> moving mapred.local.dir off of /tmp if it's configured to use /tmp right
> now; I believe some systems have quotas on /tmp.
>
> Hope this helps.
>
> Alex
>
> On Tue, Apr 7, 2009 at 7:22 PM, Jim Twensky <ji...@gmail.com> wrote:
>
> > Hi,
> >
> > I'm using Hadoop 0.19.1 and I have a very small test cluster with 9
> nodes,
> > 8
> > of them being task trackers. I'm getting the following error and my jobs
> > keep failing when map processes start hitting 30%:
> >
> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
> > valid local directory for
> >
> >
> taskTracker/jobcache/job_200904072051_0001/attempt_200904072051_0001_m_000000_1/output/file.out
> > at
> >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
> > at
> >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
> > at
> >
> >
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61)
> > at
> >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1209)
> > at
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:867)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> > at org.apache.hadoop.mapred.Child.main(Child.java:158)
> >
> >
> > I googled many blogs and web pages but I could neither understand why
> this
> > happens nor found a solution to this. What does that error message mean
> and
> > how can avoid it, any suggestions?
> >
> > Thanks in advance,
> > -jim
> >
>
Re: getting DiskErrorException during map
Posted by Alex Loddengaard <al...@cloudera.com>.
The getLocalPathForWrite function that throws this Exception assumes that
you have space on the disks that mapred.local.dir is configured on. Can you
verify with `df` that those disks have space available? You might also try
moving mapred.local.dir off of /tmp if it's configured to use /tmp right
now; I believe some systems have quotas on /tmp.
Hope this helps.
Alex
On Tue, Apr 7, 2009 at 7:22 PM, Jim Twensky <ji...@gmail.com> wrote:
> Hi,
>
> I'm using Hadoop 0.19.1 and I have a very small test cluster with 9 nodes,
> 8
> of them being task trackers. I'm getting the following error and my jobs
> keep failing when map processes start hitting 30%:
>
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
> valid local directory for
>
> taskTracker/jobcache/job_200904072051_0001/attempt_200904072051_0001_m_000000_1/output/file.out
> at
>
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
> at
>
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
> at
>
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61)
> at
>
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1209)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:867)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.Child.main(Child.java:158)
>
>
> I googled many blogs and web pages but I could neither understand why this
> happens nor found a solution to this. What does that error message mean and
> how can avoid it, any suggestions?
>
> Thanks in advance,
> -jim
>