You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/02 18:53:01 UTC
[jira] Commented: (NUTCH-159) Specify temp/working directory for crawl
[ http://issues.apache.org/jira/browse/NUTCH-159?page=comments#action_12361541 ]
Doug Cutting commented on NUTCH-159:
------------------------------------
mapred.local.dir is the thing to set. if that fails, then there is a bug. what did you have this set to?
> Specify temp/working directory for crawl
> ----------------------------------------
>
> Key: NUTCH-159
> URL: http://issues.apache.org/jira/browse/NUTCH-159
> Project: Nutch
> Type: Bug
> Components: fetcher, indexer
> Versions: 0.8-dev
> Environment: Linux/Debian
> Reporter: byron miller
>
> I ran a crawl of 100k web pages and got:
> org.apache.nutch.fs.FSError: java.io.IOException: No space left on device
> at org.apache.nutch.fs.LocalFileSystem$LocalNFSFileOutputStream.write(LocalFileSystem.java:149)
> at org.apache.nutch.fs.FileUtil.copyContents(FileUtil.java:65)
> at org.apache.nutch.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:178)
> at org.apache.nutch.fs.NutchFileSystem.rename(NutchFileSystem.java:224)
> at org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:80)
> Caused by: java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:260)
> at org.apache.nutch.fs.LocalFileSystem$LocalNFSFileOutputStream.write(LocalFileSystem.java:147)
> ... 4 more
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
> at org.apache.nutch.crawl.Fetcher.fetch(Fetcher.java:335)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:107)
> byron@db02:/data/nutch$ df -k
> It appears crawl created a /tmp/nutch directory that filled up even though i specified a db directory.
> Need to add a parameter to the command line or make a globaly configurable /tmp (work area) for the nutch instance so that crawls won't fail.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira