You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by annemarie♥ <dr...@gmail.com> on 2010/01/25 06:50:05 UTC

IOException: Spill failed on hadoop.mapred.MapTask on fetch command

Hi all,

Need help on the error I encountered with Nutch/Hadoop.
I executed the following commands:

bin/nutch generate crawled/crawldb crawled/segments  -adddays 10
bin/nutch fetch crawled/segments/20100122202241 -threads 50

While performing a fetch with 1000 URLs i got the following errors.

fetching
http://www.studyfinder.qut.edu.au/cgi-bin/WebObjects/StudyFinder.woa/wo/0.0.17.51.0.3.1
java.io.IOException: Spill failed
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:822)
at
org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:884)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:645)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException:
Could not find any valid local directory for
taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/spill6

at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
at
org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1183)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:648)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask


Before this error, fetch was running successfully. After I got this
error, all succeeding fetch
command also returned a Spill failed error.

The fetch command was completed but I got a  java.io.IOException error
when i run the updatedb command.

bin/nutch updatedb crawled/crawldb/ crawled/segments/*
CrawlDb update: starting
CrawlDb update: db: crawled/crawldb
CrawlDb update: segments: [crawled/segments/20100122022740,
crawled/segments/20100122021153, crawled/segments/20100122193552,
crawled/segments/20100122034328,
crawled/segments/20100122022121,
crawled/segments/20100122020605]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: false
CrawlDb update: URL filtering: false
CrawlDb update: Merging segment data into db.
CrawlDb update: java.io.IOException: Job failed!
       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
       at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:94)
       at org.apache.nutch.crawl.CrawlDb.run(CrawlDb.java:189)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
       at org.apache.nutch.crawl.CrawlDb.main(CrawlDb.java:150)

Thank  you for your help.

Regards,
Anne

<>< ..too blessed to be stressed.. <><

Re: IOException: Spill failed on hadoop.mapred.MapTask on fetch command

Posted by annemarie♥ <dr...@gmail.com>.

thank you Julien!

will investigate on this issue..

<>< ..too blessed to be stressed.. <><


On Mon, Jan 25, 2010 at 5:01 PM, Julien Nioche <
lists.digitalpebble@gmail.com> wrote:

> you probably ran out of disk space
>
> 2010/1/25 annemarie♥ <dr...@gmail.com>:
> > Hi all,
> >
> > Need help on the error I encountered with Nutch/Hadoop.
> > I executed the following commands:
> >
> > bin/nutch generate crawled/crawldb crawled/segments  -adddays 10
> > bin/nutch fetch crawled/segments/20100122202241 -threads 50
> >
> > While performing a fetch with 1000 URLs i got the following errors.
> >
> > fetching
> >
> http://www.studyfinder.qut.edu.au/cgi-bin/WebObjects/StudyFinder.woa/wo/0.0.17.51.0.3.1
> > java.io.IOException: Spill failed
> > at
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:822)
> > at
> >
> org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
> > at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:884)
> > at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:645)
> > Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException:
> > Could not find any valid local directory for
> >
> taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/spill6
> >
> > at
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343)
> > at
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
> > at
> >
> org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
> > at
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1183)
> > at
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:648)
> > at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask
> >
> >
> > Before this error, fetch was running successfully. After I got this
> > error, all succeeding fetch
> > command also returned a Spill failed error.
> >
> > The fetch command was completed but I got a  java.io.IOException error
> > when i run the updatedb command.
> >
> > bin/nutch updatedb crawled/crawldb/ crawled/segments/*
> > CrawlDb update: starting
> > CrawlDb update: db: crawled/crawldb
> > CrawlDb update: segments: [crawled/segments/20100122022740,
> > crawled/segments/20100122021153, crawled/segments/20100122193552,
> > crawled/segments/20100122034328,
> > crawled/segments/20100122022121,
> > crawled/segments/20100122020605]
> > CrawlDb update: additions allowed: true
> > CrawlDb update: URL normalizing: false
> > CrawlDb update: URL filtering: false
> > CrawlDb update: Merging segment data into db.
> > CrawlDb update: java.io.IOException: Job failed!
> >       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
> >       at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:94)
> >       at org.apache.nutch.crawl.CrawlDb.run(CrawlDb.java:189)
> >       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >       at org.apache.nutch.crawl.CrawlDb.main(CrawlDb.java:150)
> >
> > Thank  you for your help.
> >
> > Regards,
> > Anne
> >
> > <>< ..too blessed to be stressed.. <><
> >
>
>
>
> --
> DigitalPebble Ltd
> http://www.digitalpebble.com
>

Re: IOException: Spill failed on hadoop.mapred.MapTask on fetch command

Posted by Julien Nioche <li...@gmail.com>.

you probably ran out of disk space

2010/1/25 annemarie♥ <dr...@gmail.com>:
> Hi all,
>
> Need help on the error I encountered with Nutch/Hadoop.
> I executed the following commands:
>
> bin/nutch generate crawled/crawldb crawled/segments  -adddays 10
> bin/nutch fetch crawled/segments/20100122202241 -threads 50
>
> While performing a fetch with 1000 URLs i got the following errors.
>
> fetching
> http://www.studyfinder.qut.edu.au/cgi-bin/WebObjects/StudyFinder.woa/wo/0.0.17.51.0.3.1
> java.io.IOException: Spill failed
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:822)
> at
> org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
> at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:884)
> at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:645)
> Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException:
> Could not find any valid local directory for
> taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/spill6
>
> at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343)
> at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
> at
> org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1183)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:648)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask
>
>
> Before this error, fetch was running successfully. After I got this
> error, all succeeding fetch
> command also returned a Spill failed error.
>
> The fetch command was completed but I got a  java.io.IOException error
> when i run the updatedb command.
>
> bin/nutch updatedb crawled/crawldb/ crawled/segments/*
> CrawlDb update: starting
> CrawlDb update: db: crawled/crawldb
> CrawlDb update: segments: [crawled/segments/20100122022740,
> crawled/segments/20100122021153, crawled/segments/20100122193552,
> crawled/segments/20100122034328,
> crawled/segments/20100122022121,
> crawled/segments/20100122020605]
> CrawlDb update: additions allowed: true
> CrawlDb update: URL normalizing: false
> CrawlDb update: URL filtering: false
> CrawlDb update: Merging segment data into db.
> CrawlDb update: java.io.IOException: Job failed!
>       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
>       at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:94)
>       at org.apache.nutch.crawl.CrawlDb.run(CrawlDb.java:189)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>       at org.apache.nutch.crawl.CrawlDb.main(CrawlDb.java:150)
>
> Thank  you for your help.
>
> Regards,
> Anne
>
> <>< ..too blessed to be stressed.. <><
>



-- 
DigitalPebble Ltd
http://www.digitalpebble.com