You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Alexei Korolev <al...@gmail.com> on 2013/03/04 09:48:01 UTC

Re: DiskChecker$DiskErrorException

Hello,

It's me again :) Error is back.

Is this a reason, that I run this script, when nutch crawl?

#!/bin/bash

NUTCH_PATH=/home/developer/crawler/apache-nutch-1.4-bin/runtime/local/bin/nutch

export JAVA_HOME=/usr/

rm -rf stats

$NUTCH_PATH domainstats crawl/crawldb/current stats host
$NUTCH_PATH readdb crawl/crawldb/ -stats
$NUTCH_PATH readseg -list -dir crawl/crawldb/segments

May be this script removed some essential files from tmp directory?

Thanks.


On Mon, Feb 11, 2013 at 8:26 PM, Eyeris Rodriguez Rueda <er...@uci.cu>wrote:

> the conversation is about the cosuming of nutch crawl process in /tmp
> folder.
> see Thu, 07 Feb, 14:12
>
> http://mail-archives.apache.org/mod_mbox/nutch-user/201302.mbox/%3C3e2fc3ad-f049-4091-9ebf-9e624fb18250@ucimail3.uci.cu%3E
>
>
>
>
> ----- Mensaje original -----
> De: "Alexei Korolev" <al...@gmail.com>
> Para: user@nutch.apache.org
> Enviados: Lunes, 11 de Febrero 2013 10:50:44
> Asunto: Re: DiskChecker$DiskErrorException
>
> Hi,
>
> Thank you for your input. DU shows:
>
> root@Ubuntu-1110-oneiric-64-minimal:~# du -hs /tmp
> 5.1M    /tmp
>
> About thread. Could you give me more specified link, because right now it's
> pointing to archive of Feb, 2013.
>
> Thanks.
>
> On Mon, Feb 11, 2013 at 7:13 PM, Eyeris Rodriguez Rueda <erueda@uci.cu
> >wrote:
>
> > Hi alexei.
> > Make sure about markus suggestion, i had a same problem with /tmp folder
> > space while nutch is crawling. This folder is cleaned when you reboot the
> > system, but nutch check the available space and it can throw exceptions.
> > verify the space with
> > du -hs /tmp/
> > also check this thread
> > http://mail-archives.apache.org/mod_mbox/nutch-user/201302.mbox/browser
> >
> >
> >
> >
> >
> > ----- Mensaje original -----
> > De: "Alexei Korolev" <al...@gmail.com>
> > Para: user@nutch.apache.org
> > Enviados: Lunes, 11 de Febrero 2013 3:40:06
> > Asunto: Re: DiskChecker$DiskErrorException
> >
> > Hi,
> >
> > Yes
> >
> > Filesystem           1K-blocks      Used Available Use% Mounted on
> > /dev/md2             1065281580 592273404 419321144  59% /
> > udev                   8177228         8   8177220   1% /dev
> > tmpfs                  3274592       328   3274264   1% /run
> > none                      5120         0      5120   0% /run/lock
> > none                   8186476         0   8186476   0% /run/shm
> > /dev/md3             1808084492  15283960 1701678392   1% /home
> > /dev/md1                507684     38099    443374   8% /boot
> >
> > On Mon, Feb 11, 2013 at 12:33 PM, Markus Jelsma
> > <ma...@openindex.io>wrote:
> >
> > > Hi- Also enough space in your /tmp directory?
> > >
> > > Cheers
> > >
> > >
> > >
> > > -----Original message-----
> > > > From:Alexei Korolev <al...@gmail.com>
> > > > Sent: Mon 11-Feb-2013 09:27
> > > > To: user@nutch.apache.org
> > > > Subject: DiskChecker$DiskErrorException
> > > >
> > > > Hello,
> > > >
> > > > Already twice I got this error:
> > > >
> > > > 2013-02-08 15:26:11,674 WARN  mapred.LocalJobRunner - job_local_0001
> > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > > >
> > >
> >
> taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/spill0.out
> > > > in any of the configur
> > > > ed local directories
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapOutputFile.getSpillFile(MapOutputFile.java:94)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1443)
> > > >         at
> > > >
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1154)
> > > >         at
> > > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:359)
> > > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> > > >         at
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > 2013-02-08 15:26:12,515 ERROR fetcher.Fetcher - Fetcher:
> > > > java.io.IOException: Job failed!
> > > >         at
> > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
> > > >         at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1204)
> > > >         at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1240)
> > > >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > > >         at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1213)
> > > >
> > > > I've checked in google, but no luck. I run nutch 1.4 locally and
> have a
> > > > plenty of free space on disk.
> > > > I would much appreciate for some help.
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > --
> > > > Alexei A. Korolev
> > > >
> > >
> >
> >
> >
> > --
> > Alexei A. Korolev
> >
>
>
>
> --
> Alexei A. Korolev
>



-- 
Alexei A. Korolev

Re: DiskChecker$DiskErrorException

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi Alexei,

principally, in local mode you cannot run more than one Hadoop job
concurrently, or you have to use disjoint hadoop.tmp.dir properties.
There have been a few posts on this list about this topic.

I'm not 100% sure whether the commands in your scripts are the reason
because they should only read data and not write anything.

Sebastian

On 03/04/2013 09:48 AM, Alexei Korolev wrote:
> Hello,
> 
> It's me again :) Error is back.
> 
> Is this a reason, that I run this script, when nutch crawl?
> 
> #!/bin/bash
> 
> NUTCH_PATH=/home/developer/crawler/apache-nutch-1.4-bin/runtime/local/bin/nutch
> 
> export JAVA_HOME=/usr/
> 
> rm -rf stats
> 
> $NUTCH_PATH domainstats crawl/crawldb/current stats host
> $NUTCH_PATH readdb crawl/crawldb/ -stats
> $NUTCH_PATH readseg -list -dir crawl/crawldb/segments
> 
> May be this script removed some essential files from tmp directory?
> 
> Thanks.
> 
> 
> On Mon, Feb 11, 2013 at 8:26 PM, Eyeris Rodriguez Rueda <er...@uci.cu>wrote:
> 
>> the conversation is about the cosuming of nutch crawl process in /tmp
>> folder.
>> see Thu, 07 Feb, 14:12
>>
>> http://mail-archives.apache.org/mod_mbox/nutch-user/201302.mbox/%3C3e2fc3ad-f049-4091-9ebf-9e624fb18250@ucimail3.uci.cu%3E
>>
>>
>>
>>
>> ----- Mensaje original -----
>> De: "Alexei Korolev" <al...@gmail.com>
>> Para: user@nutch.apache.org
>> Enviados: Lunes, 11 de Febrero 2013 10:50:44
>> Asunto: Re: DiskChecker$DiskErrorException
>>
>> Hi,
>>
>> Thank you for your input. DU shows:
>>
>> root@Ubuntu-1110-oneiric-64-minimal:~# du -hs /tmp
>> 5.1M    /tmp
>>
>> About thread. Could you give me more specified link, because right now it's
>> pointing to archive of Feb, 2013.
>>
>> Thanks.
>>
>> On Mon, Feb 11, 2013 at 7:13 PM, Eyeris Rodriguez Rueda <erueda@uci.cu
>>> wrote:
>>
>>> Hi alexei.
>>> Make sure about markus suggestion, i had a same problem with /tmp folder
>>> space while nutch is crawling. This folder is cleaned when you reboot the
>>> system, but nutch check the available space and it can throw exceptions.
>>> verify the space with
>>> du -hs /tmp/
>>> also check this thread
>>> http://mail-archives.apache.org/mod_mbox/nutch-user/201302.mbox/browser
>>>
>>>
>>>
>>>
>>>
>>> ----- Mensaje original -----
>>> De: "Alexei Korolev" <al...@gmail.com>
>>> Para: user@nutch.apache.org
>>> Enviados: Lunes, 11 de Febrero 2013 3:40:06
>>> Asunto: Re: DiskChecker$DiskErrorException
>>>
>>> Hi,
>>>
>>> Yes
>>>
>>> Filesystem           1K-blocks      Used Available Use% Mounted on
>>> /dev/md2             1065281580 592273404 419321144  59% /
>>> udev                   8177228         8   8177220   1% /dev
>>> tmpfs                  3274592       328   3274264   1% /run
>>> none                      5120         0      5120   0% /run/lock
>>> none                   8186476         0   8186476   0% /run/shm
>>> /dev/md3             1808084492  15283960 1701678392   1% /home
>>> /dev/md1                507684     38099    443374   8% /boot
>>>
>>> On Mon, Feb 11, 2013 at 12:33 PM, Markus Jelsma
>>> <ma...@openindex.io>wrote:
>>>
>>>> Hi- Also enough space in your /tmp directory?
>>>>
>>>> Cheers
>>>>
>>>>
>>>>
>>>> -----Original message-----
>>>>> From:Alexei Korolev <al...@gmail.com>
>>>>> Sent: Mon 11-Feb-2013 09:27
>>>>> To: user@nutch.apache.org
>>>>> Subject: DiskChecker$DiskErrorException
>>>>>
>>>>> Hello,
>>>>>
>>>>> Already twice I got this error:
>>>>>
>>>>> 2013-02-08 15:26:11,674 WARN  mapred.LocalJobRunner - job_local_0001
>>>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>>>>>
>>>>
>>>
>> taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/spill0.out
>>>>> in any of the configur
>>>>> ed local directories
>>>>>         at
>>>>>
>>>>
>>>
>> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389)
>>>>>         at
>>>>>
>>>>
>>>
>> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
>>>>>         at
>>>>>
>>>>
>>>
>> org.apache.hadoop.mapred.MapOutputFile.getSpillFile(MapOutputFile.java:94)
>>>>>         at
>>>>>
>>>>
>>>
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1443)
>>>>>         at
>>>>>
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1154)
>>>>>         at
>>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:359)
>>>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>         at
>>>>>
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>>> 2013-02-08 15:26:12,515 ERROR fetcher.Fetcher - Fetcher:
>>>>> java.io.IOException: Job failed!
>>>>>         at
>>> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
>>>>>         at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1204)
>>>>>         at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1240)
>>>>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>>         at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1213)
>>>>>
>>>>> I've checked in google, but no luck. I run nutch 1.4 locally and
>> have a
>>>>> plenty of free space on disk.
>>>>> I would much appreciate for some help.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> --
>>>>> Alexei A. Korolev
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Alexei A. Korolev
>>>
>>
>>
>>
>> --
>> Alexei A. Korolev
>>
> 
> 
>