You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Alexei Korolev <al...@gmail.com> on 2013/03/04 09:48:01 UTC
Re: DiskChecker$DiskErrorException
Hello,
It's me again :) Error is back.
Is this a reason, that I run this script, when nutch crawl?
#!/bin/bash
NUTCH_PATH=/home/developer/crawler/apache-nutch-1.4-bin/runtime/local/bin/nutch
export JAVA_HOME=/usr/
rm -rf stats
$NUTCH_PATH domainstats crawl/crawldb/current stats host
$NUTCH_PATH readdb crawl/crawldb/ -stats
$NUTCH_PATH readseg -list -dir crawl/crawldb/segments
May be this script removed some essential files from tmp directory?
Thanks.
On Mon, Feb 11, 2013 at 8:26 PM, Eyeris Rodriguez Rueda <er...@uci.cu>wrote:
> the conversation is about the cosuming of nutch crawl process in /tmp
> folder.
> see Thu, 07 Feb, 14:12
>
> http://mail-archives.apache.org/mod_mbox/nutch-user/201302.mbox/%3C3e2fc3ad-f049-4091-9ebf-9e624fb18250@ucimail3.uci.cu%3E
>
>
>
>
> ----- Mensaje original -----
> De: "Alexei Korolev" <al...@gmail.com>
> Para: user@nutch.apache.org
> Enviados: Lunes, 11 de Febrero 2013 10:50:44
> Asunto: Re: DiskChecker$DiskErrorException
>
> Hi,
>
> Thank you for your input. DU shows:
>
> root@Ubuntu-1110-oneiric-64-minimal:~# du -hs /tmp
> 5.1M /tmp
>
> About thread. Could you give me more specified link, because right now it's
> pointing to archive of Feb, 2013.
>
> Thanks.
>
> On Mon, Feb 11, 2013 at 7:13 PM, Eyeris Rodriguez Rueda <erueda@uci.cu
> >wrote:
>
> > Hi alexei.
> > Make sure about markus suggestion, i had a same problem with /tmp folder
> > space while nutch is crawling. This folder is cleaned when you reboot the
> > system, but nutch check the available space and it can throw exceptions.
> > verify the space with
> > du -hs /tmp/
> > also check this thread
> > http://mail-archives.apache.org/mod_mbox/nutch-user/201302.mbox/browser
> >
> >
> >
> >
> >
> > ----- Mensaje original -----
> > De: "Alexei Korolev" <al...@gmail.com>
> > Para: user@nutch.apache.org
> > Enviados: Lunes, 11 de Febrero 2013 3:40:06
> > Asunto: Re: DiskChecker$DiskErrorException
> >
> > Hi,
> >
> > Yes
> >
> > Filesystem 1K-blocks Used Available Use% Mounted on
> > /dev/md2 1065281580 592273404 419321144 59% /
> > udev 8177228 8 8177220 1% /dev
> > tmpfs 3274592 328 3274264 1% /run
> > none 5120 0 5120 0% /run/lock
> > none 8186476 0 8186476 0% /run/shm
> > /dev/md3 1808084492 15283960 1701678392 1% /home
> > /dev/md1 507684 38099 443374 8% /boot
> >
> > On Mon, Feb 11, 2013 at 12:33 PM, Markus Jelsma
> > <ma...@openindex.io>wrote:
> >
> > > Hi- Also enough space in your /tmp directory?
> > >
> > > Cheers
> > >
> > >
> > >
> > > -----Original message-----
> > > > From:Alexei Korolev <al...@gmail.com>
> > > > Sent: Mon 11-Feb-2013 09:27
> > > > To: user@nutch.apache.org
> > > > Subject: DiskChecker$DiskErrorException
> > > >
> > > > Hello,
> > > >
> > > > Already twice I got this error:
> > > >
> > > > 2013-02-08 15:26:11,674 WARN mapred.LocalJobRunner - job_local_0001
> > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > > >
> > >
> >
> taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/spill0.out
> > > > in any of the configur
> > > > ed local directories
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapOutputFile.getSpillFile(MapOutputFile.java:94)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1443)
> > > > at
> > > >
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1154)
> > > > at
> > > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:359)
> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> > > > at
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > 2013-02-08 15:26:12,515 ERROR fetcher.Fetcher - Fetcher:
> > > > java.io.IOException: Job failed!
> > > > at
> > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
> > > > at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1204)
> > > > at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1240)
> > > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > > > at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1213)
> > > >
> > > > I've checked in google, but no luck. I run nutch 1.4 locally and
> have a
> > > > plenty of free space on disk.
> > > > I would much appreciate for some help.
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > --
> > > > Alexei A. Korolev
> > > >
> > >
> >
> >
> >
> > --
> > Alexei A. Korolev
> >
>
>
>
> --
> Alexei A. Korolev
>
--
Alexei A. Korolev
Re: DiskChecker$DiskErrorException
Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi Alexei,
principally, in local mode you cannot run more than one Hadoop job
concurrently, or you have to use disjoint hadoop.tmp.dir properties.
There have been a few posts on this list about this topic.
I'm not 100% sure whether the commands in your scripts are the reason
because they should only read data and not write anything.
Sebastian
On 03/04/2013 09:48 AM, Alexei Korolev wrote:
> Hello,
>
> It's me again :) Error is back.
>
> Is this a reason, that I run this script, when nutch crawl?
>
> #!/bin/bash
>
> NUTCH_PATH=/home/developer/crawler/apache-nutch-1.4-bin/runtime/local/bin/nutch
>
> export JAVA_HOME=/usr/
>
> rm -rf stats
>
> $NUTCH_PATH domainstats crawl/crawldb/current stats host
> $NUTCH_PATH readdb crawl/crawldb/ -stats
> $NUTCH_PATH readseg -list -dir crawl/crawldb/segments
>
> May be this script removed some essential files from tmp directory?
>
> Thanks.
>
>
> On Mon, Feb 11, 2013 at 8:26 PM, Eyeris Rodriguez Rueda <er...@uci.cu>wrote:
>
>> the conversation is about the cosuming of nutch crawl process in /tmp
>> folder.
>> see Thu, 07 Feb, 14:12
>>
>> http://mail-archives.apache.org/mod_mbox/nutch-user/201302.mbox/%3C3e2fc3ad-f049-4091-9ebf-9e624fb18250@ucimail3.uci.cu%3E
>>
>>
>>
>>
>> ----- Mensaje original -----
>> De: "Alexei Korolev" <al...@gmail.com>
>> Para: user@nutch.apache.org
>> Enviados: Lunes, 11 de Febrero 2013 10:50:44
>> Asunto: Re: DiskChecker$DiskErrorException
>>
>> Hi,
>>
>> Thank you for your input. DU shows:
>>
>> root@Ubuntu-1110-oneiric-64-minimal:~# du -hs /tmp
>> 5.1M /tmp
>>
>> About thread. Could you give me more specified link, because right now it's
>> pointing to archive of Feb, 2013.
>>
>> Thanks.
>>
>> On Mon, Feb 11, 2013 at 7:13 PM, Eyeris Rodriguez Rueda <erueda@uci.cu
>>> wrote:
>>
>>> Hi alexei.
>>> Make sure about markus suggestion, i had a same problem with /tmp folder
>>> space while nutch is crawling. This folder is cleaned when you reboot the
>>> system, but nutch check the available space and it can throw exceptions.
>>> verify the space with
>>> du -hs /tmp/
>>> also check this thread
>>> http://mail-archives.apache.org/mod_mbox/nutch-user/201302.mbox/browser
>>>
>>>
>>>
>>>
>>>
>>> ----- Mensaje original -----
>>> De: "Alexei Korolev" <al...@gmail.com>
>>> Para: user@nutch.apache.org
>>> Enviados: Lunes, 11 de Febrero 2013 3:40:06
>>> Asunto: Re: DiskChecker$DiskErrorException
>>>
>>> Hi,
>>>
>>> Yes
>>>
>>> Filesystem 1K-blocks Used Available Use% Mounted on
>>> /dev/md2 1065281580 592273404 419321144 59% /
>>> udev 8177228 8 8177220 1% /dev
>>> tmpfs 3274592 328 3274264 1% /run
>>> none 5120 0 5120 0% /run/lock
>>> none 8186476 0 8186476 0% /run/shm
>>> /dev/md3 1808084492 15283960 1701678392 1% /home
>>> /dev/md1 507684 38099 443374 8% /boot
>>>
>>> On Mon, Feb 11, 2013 at 12:33 PM, Markus Jelsma
>>> <ma...@openindex.io>wrote:
>>>
>>>> Hi- Also enough space in your /tmp directory?
>>>>
>>>> Cheers
>>>>
>>>>
>>>>
>>>> -----Original message-----
>>>>> From:Alexei Korolev <al...@gmail.com>
>>>>> Sent: Mon 11-Feb-2013 09:27
>>>>> To: user@nutch.apache.org
>>>>> Subject: DiskChecker$DiskErrorException
>>>>>
>>>>> Hello,
>>>>>
>>>>> Already twice I got this error:
>>>>>
>>>>> 2013-02-08 15:26:11,674 WARN mapred.LocalJobRunner - job_local_0001
>>>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>>>>>
>>>>
>>>
>> taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/spill0.out
>>>>> in any of the configur
>>>>> ed local directories
>>>>> at
>>>>>
>>>>
>>>
>> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389)
>>>>> at
>>>>>
>>>>
>>>
>> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
>>>>> at
>>>>>
>>>>
>>>
>> org.apache.hadoop.mapred.MapOutputFile.getSpillFile(MapOutputFile.java:94)
>>>>> at
>>>>>
>>>>
>>>
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1443)
>>>>> at
>>>>>
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1154)
>>>>> at
>>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:359)
>>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>> at
>>>>>
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>>> 2013-02-08 15:26:12,515 ERROR fetcher.Fetcher - Fetcher:
>>>>> java.io.IOException: Job failed!
>>>>> at
>>> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
>>>>> at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1204)
>>>>> at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1240)
>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>> at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1213)
>>>>>
>>>>> I've checked in google, but no luck. I run nutch 1.4 locally and
>> have a
>>>>> plenty of free space on disk.
>>>>> I would much appreciate for some help.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> --
>>>>> Alexei A. Korolev
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Alexei A. Korolev
>>>
>>
>>
>>
>> --
>> Alexei A. Korolev
>>
>
>
>