You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Eyeris Rodriguez Rueda <er...@uci.cu> on 2013/02/07 15:12:08 UTC

Could not find any valid local directory for output/file.out

Hi all.
I have a problem when i do a crawl for few hour or days, im using nutch 1.5.1 and solr 3.6, but the crawl process fails and i dont know how to fix this problem, im intersted in make a crawl process without limit with 10 cicles or more but i have problem with space on hard disk, i have detected that /etc/tmp have 29 GB used and is not good for me, any body can help me or give some advices for configure nutch to make at least one crawl process without problems ?

here some features of my environment
Ram 2 GB
CPU:QuadCore(but im using only 2 cores)
Hard Disk:40 GB
Threads:50
db.fetch.interval.default=2 days



this is a part of my log file when nutch fails:

****************************************************************
2013-02-06 18:45:25,961 INFO  fetcher.Fetcher - fetching http://bibliodoc.uci.cu/TD/TD_03349_10.pdf
2013-02-06 18:45:25,964 INFO  fetcher.Fetcher - fetching http://bibliodoc.uci.cu/TD/TD_0442_07.pdf
2013-02-06 18:45:25,977 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=49
2013-02-06 18:45:26,109 INFO  fetcher.Fetcher - -activeThreads=49, spinWaiting=39, fetchQueues.totalSize=0
2013-02-06 18:45:26,180 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=48
2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=47
2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=46
2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=44
2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=45
2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=40
2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=39
2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=38
2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=37
2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=36
2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=35
2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=34
2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=33
2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=32
2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=31
2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=30
2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=29
2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=28
2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=27
2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=26
2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=25
2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=24
2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=23
2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=22
2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=21
2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=20
2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=19
2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=18
2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=41
2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=17
2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=15
2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=13
2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=12
2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=42
2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=43
2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=9
2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=10
2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=11
2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=14
2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=16
2013-02-06 18:45:26,404 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=8
2013-02-06 18:45:26,630 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=7
2013-02-06 18:45:27,069 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=6
2013-02-06 18:45:27,110 INFO  fetcher.Fetcher - -activeThreads=6, spinWaiting=0, fetchQueues.totalSize=0
2013-02-06 18:45:27,129 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=5
2013-02-06 18:45:28,110 INFO  fetcher.Fetcher - -activeThreads=5, spinWaiting=0, fetchQueues.totalSize=0
2013-02-06 18:45:28,502 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=4
2013-02-06 18:45:29,111 INFO  fetcher.Fetcher - -activeThreads=4, spinWaiting=0, fetchQueues.totalSize=0
2013-02-06 18:45:30,123 INFO  fetcher.Fetcher - -activeThreads=4, spinWaiting=0, fetchQueues.totalSize=0
2013-02-06 18:45:31,127 INFO  fetcher.Fetcher - -activeThreads=4, spinWaiting=0, fetchQueues.totalSize=0
2013-02-06 18:45:31,187 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=3
2013-02-06 18:45:32,171 INFO  fetcher.Fetcher - -activeThreads=3, spinWaiting=0, fetchQueues.totalSize=0
2013-02-06 18:45:32,206 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=2
2013-02-06 18:45:33,173 INFO  fetcher.Fetcher - -activeThreads=2, spinWaiting=0, fetchQueues.totalSize=0
2013-02-06 18:45:34,173 INFO  fetcher.Fetcher - -activeThreads=2, spinWaiting=0, fetchQueues.totalSize=0
2013-02-06 18:45:34,205 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=1
2013-02-06 18:45:34,457 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=0
2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0
2013-02-06 18:45:35,742 WARN  mapred.LocalJobRunner - job_local_0015
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/file.out
	at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
	at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
	at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
	at org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

Re: Could not find any valid local directory for output/file.out

Posted by Lewis John Mcgibbney <le...@gmail.com>.
This is strictly a hadoop configuration property so that is why it is not
included in nutch-default... by default.
However you can override it as follows

<property>
<name>hadoop.tmp.dir</name>
<value>${path/to/hadoop/temp}</value>
</property>

I'll add this explicitly to the wiki

Lewis

On Thu, Feb 7, 2013 at 12:47 PM, Eyeris Rodriguez Rueda <er...@uci.cu>wrote:

> Thank to all for your replies.
> If i want to change the default location for hadoop job(/tmp), where i can
> do that ?, because my nutch-site.xml not include nothing pointing to /tmp.
> So I have readed about nutch and hadoop but im not sure to understand at
> all. Is posible to use nutch 1.5.1 in distributed mode ? In this case what
> i need to do for that, I really appreciated your answer because I can´t
> find a good documentation for this topic.
>
>
>
>
> ----- Mensaje original -----
> De: "Tejas Patil" <te...@gmail.com>
> Para: user@nutch.apache.org
> Enviados: Jueves, 7 de Febrero 2013 14:04:26
> Asunto: Re: Could not find any valid local directory for output/file.out
>
> Nutch jobs are executed by Hadoop. "/tmp" is the default location used by
> hadoop to store temporary data required for a job. If you dont over-ride
> hadoop.tmp.dir in any config file, it will use /tmp by default. In your
> case, /tmp doesnt have ample space left so better over-ride that property
> and point it to some other location which has ample space.
>
> Thanks,
> Tejas Patil
>
>
> On Thu, Feb 7, 2013 at 10:38 AM, Eyeris Rodriguez Rueda <erueda@uci.cu
> >wrote:
>
> > Thanks lewis by your answer.
> > My doubt is why /tmp is increasing while crawl process is doing, and why
> > nutch use that folder. Im using nutch 1.5.1 in single mode and my nutch
> > site not have properties hadoop.tmp.dir. I need reduce the space used for
> > that folder because I only have 40 GB for nutch machine and 50 GB for
> solr
> > machine. Please some advice or explanation will be accepted.
> > Thanks for your time.
> >
> >
> >
> > ----- Mensaje original -----
> > De: "Lewis John Mcgibbney" <le...@gmail.com>
> > Para: user@nutch.apache.org
> > Enviados: Jueves, 7 de Febrero 2013 13:06:11
> > Asunto: Re: Could not find any valid local directory for output/file.out
> >
> > Hi,
> >
> >
> >
> https://wiki.apache.org/nutch/NutchGotchas#DiskErrorException_while_fetching
> >
> > On Thursday, February 7, 2013, Eyeris Rodriguez Rueda <er...@uci.cu>
> > wrote:
> > > Hi all.
> > > I have a problem when i do a crawl for few hour or days, im using nutch
> > 1.5.1 and solr 3.6, but the crawl process fails and i dont know how to
> fix
> > this problem, im intersted in make a crawl process without limit with 10
> > cicles or more but i have problem with space on hard disk, i have
> detected
> > that /etc/tmp have 29 GB used and is not good for me, any body can help
> me
> > or give some advices for configure nutch to make at least one crawl
> process
> > without problems ?
> > >
> > > here some features of my environment
> > > Ram 2 GB
> > > CPU:QuadCore(but im using only 2 cores)
> > > Hard Disk:40 GB
> > > Threads:50
> > > db.fetch.interval.default=2 days
> > >
> > >
> > >
> > > this is a part of my log file when nutch fails:
> > >
> > > ****************************************************************
> > > 2013-02-06 18:45:25,961 INFO  fetcher.Fetcher - fetching
> > http://bibliodoc.uci.cu/TD/TD_03349_10.pdf
> > > 2013-02-06 18:45:25,964 INFO  fetcher.Fetcher - fetching
> > http://bibliodoc.uci.cu/TD/TD_0442_07.pdf
> > > 2013-02-06 18:45:25,977 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=49
> > > 2013-02-06 18:45:26,109 INFO  fetcher.Fetcher - -activeThreads=49,
> > spinWaiting=39, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:26,180 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=48
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=47
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=46
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=44
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=45
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=40
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=39
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=38
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=37
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=36
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=35
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=34
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=33
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=32
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=31
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=30
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=29
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=28
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=27
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=26
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=25
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=24
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=23
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=22
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=21
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=20
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=19
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=18
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=41
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=17
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=15
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=13
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=12
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=42
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=43
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=9
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=10
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=11
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=14
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=16
> > > 2013-02-06 18:45:26,404 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=8
> > > 2013-02-06 18:45:26,630 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=7
> > > 2013-02-06 18:45:27,069 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=6
> > > 2013-02-06 18:45:27,110 INFO  fetcher.Fetcher - -activeThreads=6,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:27,129 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=5
> > > 2013-02-06 18:45:28,110 INFO  fetcher.Fetcher - -activeThreads=5,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:28,502 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=4
> > > 2013-02-06 18:45:29,111 INFO  fetcher.Fetcher - -activeThreads=4,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:30,123 INFO  fetcher.Fetcher - -activeThreads=4,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:31,127 INFO  fetcher.Fetcher - -activeThreads=4,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:31,187 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=3
> > > 2013-02-06 18:45:32,171 INFO  fetcher.Fetcher - -activeThreads=3,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:32,206 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=2
> > > 2013-02-06 18:45:33,173 INFO  fetcher.Fetcher - -activeThreads=2,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:34,173 INFO  fetcher.Fetcher - -activeThreads=2,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:34,205 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=1
> > > 2013-02-06 18:45:34,457 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=0
> > > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0
> > > 2013-02-06 18:45:35,742 WARN  mapred.LocalJobRunner - job_local_0015
> > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> any
> > valid local directory for output/file.out
> > >         at
> >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
> > >         at
> >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
> > >         at
> >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
> > >         at
> >
> >
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
> > >         at
> >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
> > >         at
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
> > >         at
> > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
> > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> > >         at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> > >
> >
> > --
> > *Lewis*
> >
>



-- 
*Lewis*

RE: Could not find any valid local directory for output/file.out

Posted by Markus Jelsma <ma...@openindex.io>.
The /tmp directory is not cleaned up IIRC. You're safe to empty it as long a you don't have a job running ;)
 
-----Original message-----
> From:Lewis John Mcgibbney <le...@gmail.com>
> Sent: Fri 08-Feb-2013 20:48
> To: user@nutch.apache.org
> Subject: Re: Could not find any valid local directory for output/file.out
> 
> +1
> This is a ridiculous size of tmp for a crawldb of minimal size.
> There is clearly something wrong
> 
> On Friday, February 8, 2013, Tejas Patil <te...@gmail.com> wrote:
> > I dont think there is any such property. Maybe its time for you to cleanup
> > /tmp :)
> >
> > Thanks,
> > Tejas Patil
> >
> >
> > On Fri, Feb 8, 2013 at 11:16 AM, Eyeris Rodriguez Rueda <erueda@uci.cu
> >wrote:
> >
> >> Hi lewis an tejas again.
> >> I have point the hadoop.tmp.dir property but nutch still consuming to
> much
> >> space for me.
> >> Is posible to reduce the space of nutch in my tmp folder with some
> >> property of a fetcher process? I always get an exception because the hard
> >> disk is full. my crawldb only have 150 MB not more. but my tmp folder
> >> continue increasing without control until 60 GB, and fail at this point.
> >> please any help
> >>
> >>
> >>
> >>
> >> ----- Mensaje original -----
> >> De: "Eyeris Rodriguez Rueda" <er...@uci.cu>
> >> Para: user@nutch.apache.org
> >> Enviados: Viernes, 8 de Febrero 2013 10:45:52
> >> Asunto: Re: Could not find any valid local directory for output/file.out
> >>
> >> Thanks a lot. lewis and tejas, you are very helpfull for me.
> >> It function ok, I have pointed to another partition and ok.
> >> Problem solved.
> >>
> >>
> >>
> >>
> >>
> >> ----- Mensaje original -----
> >> De: "Tejas Patil" <te...@gmail.com>
> >> Para: user@nutch.apache.org
> >> Enviados: Jueves, 7 de Febrero 2013 16:32:33
> >> Asunto: Re: Could not find any valid local directory for output/file.out
> >>
> >> On Thu, Feb 7, 2013 at 12:47 PM, Eyeris Rodriguez Rueda <erueda@uci.cu
> >> >wrote:
> >>
> >> > Thank to all for your replies.
> >> > If i want to change the default location for hadoop job(/tmp), where i
> >> can
> >> > do that ?, because my nutch-site.xml not include nothing pointing to
> >> /tmp.
> >> >
> >> Add this property to nutch-site.xml with appropriate value:
> >>
> >> <property>
> >> <name>hadoop.tmp.dir</name>
> >> <value>XXXXXXXXXX</value>
> >> </property>
> >>
> >>
> >>
> >> > So I have readed about nutch and hadoop but im not sure to understand
> at
> >> > all. Is posible to use nutch 1.5.1 in distributed mode ?
> >>
> >> yes
> >>
> >>
> >> > In this case what i need to do for that, I really appreciated your
> answer
> >> > because I can´t find a good documentation for this topic.
> >> >
> >> For distributed mode, Nutch is called from runtime/deploy. The conf files
> >> should be modified in runtime/local/conf, not in $NUTCH_HOME/conf.
> >> So modify the runtime/local/conf/nutch-site.xml to set
> >> http.agent.nameproperly.  I am assuming that the hadoop setup is in
> >> place and hadoop
> >> variables are exported. Now, run the nutch commands from runtime/deploy.
> >>
> >> Thanks,
> >> Tejas Patil
> >>
> >> >
> >> >
> >> >
> >> > ----- Mensaje original -----
> >> > De: "Tejas Patil" <te...@gmail.com>
> >> > Para: user@nutch.apache.org
> >> > Enviados: Jueves, 7 de Febrero 2013 14:04:26
> >> > Asunto: Re: Could not find any valid local directory for
> output/file.out
> >> >
> >> > Nutch jobs are executed by Hadoop. "/tmp" is the default location used
> by
> >> > hadoop to store temporary data required for a job. If you dont
> over-ride
> >> > hadoop.tmp.dir in any config file, it will use /tmp by default. In your
> >> > case, /tmp doesnt have ample space left so better over-ride that
> property
> >> > and point it to some other location which has ample space.
> >> >
> >> > Thanks,
> >> > Tejas Patil
> >> >
> >> >
> >> > On Thu, Feb 7, 2013 at 10:38 AM, Eyeris Rodriguez Rueda <erueda@uci.cu
> >> > >wrote:
> >> >
> >> > > Thanks lewis by your answer.
> >> > > My doubt is why /tmp is increasing while crawl process is doing, and
> >> why
> >> > > nutch use that folder. Im using nutch 1.5.1 in single mode and my
> nutch
> >> > > site not have properties hadoop.tmp.dir. I need reduce the space used
> >> for
> >> > > that folder because I only have 40 GB for nutch machine and 50 GB for
> >> > solr
> >> > > machine. Please some advice or expla
> 
> -- 
> *Lewis*
> 

Re: Could not find any valid local directory for output/file.out

Posted by Lewis John Mcgibbney <le...@gmail.com>.
+1
This is a ridiculous size of tmp for a crawldb of minimal size.
There is clearly something wrong

On Friday, February 8, 2013, Tejas Patil <te...@gmail.com> wrote:
> I dont think there is any such property. Maybe its time for you to cleanup
> /tmp :)
>
> Thanks,
> Tejas Patil
>
>
> On Fri, Feb 8, 2013 at 11:16 AM, Eyeris Rodriguez Rueda <erueda@uci.cu
>wrote:
>
>> Hi lewis an tejas again.
>> I have point the hadoop.tmp.dir property but nutch still consuming to
much
>> space for me.
>> Is posible to reduce the space of nutch in my tmp folder with some
>> property of a fetcher process? I always get an exception because the hard
>> disk is full. my crawldb only have 150 MB not more. but my tmp folder
>> continue increasing without control until 60 GB, and fail at this point.
>> please any help
>>
>>
>>
>>
>> ----- Mensaje original -----
>> De: "Eyeris Rodriguez Rueda" <er...@uci.cu>
>> Para: user@nutch.apache.org
>> Enviados: Viernes, 8 de Febrero 2013 10:45:52
>> Asunto: Re: Could not find any valid local directory for output/file.out
>>
>> Thanks a lot. lewis and tejas, you are very helpfull for me.
>> It function ok, I have pointed to another partition and ok.
>> Problem solved.
>>
>>
>>
>>
>>
>> ----- Mensaje original -----
>> De: "Tejas Patil" <te...@gmail.com>
>> Para: user@nutch.apache.org
>> Enviados: Jueves, 7 de Febrero 2013 16:32:33
>> Asunto: Re: Could not find any valid local directory for output/file.out
>>
>> On Thu, Feb 7, 2013 at 12:47 PM, Eyeris Rodriguez Rueda <erueda@uci.cu
>> >wrote:
>>
>> > Thank to all for your replies.
>> > If i want to change the default location for hadoop job(/tmp), where i
>> can
>> > do that ?, because my nutch-site.xml not include nothing pointing to
>> /tmp.
>> >
>> Add this property to nutch-site.xml with appropriate value:
>>
>> <property>
>> <name>hadoop.tmp.dir</name>
>> <value>XXXXXXXXXX</value>
>> </property>
>>
>>
>>
>> > So I have readed about nutch and hadoop but im not sure to understand
at
>> > all. Is posible to use nutch 1.5.1 in distributed mode ?
>>
>> yes
>>
>>
>> > In this case what i need to do for that, I really appreciated your
answer
>> > because I can´t find a good documentation for this topic.
>> >
>> For distributed mode, Nutch is called from runtime/deploy. The conf files
>> should be modified in runtime/local/conf, not in $NUTCH_HOME/conf.
>> So modify the runtime/local/conf/nutch-site.xml to set
>> http.agent.nameproperly.  I am assuming that the hadoop setup is in
>> place and hadoop
>> variables are exported. Now, run the nutch commands from runtime/deploy.
>>
>> Thanks,
>> Tejas Patil
>>
>> >
>> >
>> >
>> > ----- Mensaje original -----
>> > De: "Tejas Patil" <te...@gmail.com>
>> > Para: user@nutch.apache.org
>> > Enviados: Jueves, 7 de Febrero 2013 14:04:26
>> > Asunto: Re: Could not find any valid local directory for
output/file.out
>> >
>> > Nutch jobs are executed by Hadoop. "/tmp" is the default location used
by
>> > hadoop to store temporary data required for a job. If you dont
over-ride
>> > hadoop.tmp.dir in any config file, it will use /tmp by default. In your
>> > case, /tmp doesnt have ample space left so better over-ride that
property
>> > and point it to some other location which has ample space.
>> >
>> > Thanks,
>> > Tejas Patil
>> >
>> >
>> > On Thu, Feb 7, 2013 at 10:38 AM, Eyeris Rodriguez Rueda <erueda@uci.cu
>> > >wrote:
>> >
>> > > Thanks lewis by your answer.
>> > > My doubt is why /tmp is increasing while crawl process is doing, and
>> why
>> > > nutch use that folder. Im using nutch 1.5.1 in single mode and my
nutch
>> > > site not have properties hadoop.tmp.dir. I need reduce the space used
>> for
>> > > that folder because I only have 40 GB for nutch machine and 50 GB for
>> > solr
>> > > machine. Please some advice or expla

-- 
*Lewis*

Re: Could not find any valid local directory for output/file.out

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Is truncating content not a possibility? By default, parsing is skipped for
truncated docs IIRC.



On Fri, Feb 8, 2013 at 4:18 PM, Eyeris Rodriguez Rueda <er...@uci.cu>wrote:

> I have an idea of what was the problem, there is a url that contain a
> repository of pdf documents and nutch delay and delay in this domain, Im
> doing a crawl process without topN parameter and for that reason nutch was
> trying to fetch all those pdf in that site.
> Is posible configure nutch to make a crawl without topN and restrict the
> number of url fetched ?, im thinking to make block for each cicle to avoid
> the amount of space used in /tmp .
> It will be great because if nutch find a collection of pdf bigger than our
> hard disk, it will fail
>
>

Re: Could not find any valid local directory for output/file.out

Posted by Eyeris Rodriguez Rueda <er...@uci.cu>.
I have an idea of what was the problem, there is a url that contain a repository of pdf documents and nutch delay and delay in this domain, Im doing a crawl process without topN parameter and for that reason nutch was trying to fetch all those pdf in that site.
Is posible configure nutch to make a crawl without topN and restrict the number of url fetched ?, im thinking to make block for each cicle to avoid the amount of space used in /tmp .
It will be great because if nutch find a collection of pdf bigger than our hard disk, it will fail 

 

----- Mensaje original -----
De: "Markus Jelsma" <ma...@openindex.io>
Para: user@nutch.apache.org
Enviados: Viernes, 8 de Febrero 2013 14:56:17
Asunto: RE: Could not find any valid local directory for output/file.out

Hadoop stores temporary files there such as shuffling map output data, you need it! But you can rf -r it after a complete crawl cycle. Do not clear it while a job is running, it's going to miss it's temp files.
 
-----Original message-----
> From:Eyeris Rodriguez Rueda <er...@uci.cu>
> Sent: Fri 08-Feb-2013 20:53
> To: user@nutch.apache.org
> Subject: Re: Could not find any valid local directory for output/file.out
> 
> Im using ubuntu server 12.04 only for nutch, I have asigned 40 GB for this. Is /tmp needed for nutch crawl process ? or i can make a crontab for delete /tmp content without problem for nutch crawl.
> 
> 
> 
> 
> ----- Mensaje original -----
> De: "Tejas Patil" <te...@gmail.com>
> Para: user@nutch.apache.org
> Enviados: Viernes, 8 de Febrero 2013 14:33:25
> Asunto: Re: Could not find any valid local directory for output/file.out
> 
> I dont think there is any such property. Maybe its time for you to cleanup
> /tmp :)
> 
> Thanks,
> Tejas Patil
> 
> 
> On Fri, Feb 8, 2013 at 11:16 AM, Eyeris Rodriguez Rueda <er...@uci.cu>wrote:
> 
> > Hi lewis an tejas again.
> > I have point the hadoop.tmp.dir property but nutch still consuming to much
> > space for me.
> > Is posible to reduce the space of nutch in my tmp folder with some
> > property of a fetcher process? I always get an exception because the hard
> > disk is full. my crawldb only have 150 MB not more. but my tmp folder
> > continue increasing without control until 60 GB, and fail at this point.
> > please any help
> >
> >
> >
> >
> > ----- Mensaje original -----
> > De: "Eyeris Rodriguez Rueda" <er...@uci.cu>
> > Para: user@nutch.apache.org
> > Enviados: Viernes, 8 de Febrero 2013 10:45:52
> > Asunto: Re: Could not find any valid local directory for output/file.out
> >
> > Thanks a lot. lewis and tejas, you are very helpfull for me.
> > It function ok, I have pointed to another partition and ok.
> > Problem solved.
> >
> >
> >
> >
> >
> > ----- Mensaje original -----
> > De: "Tejas Patil" <te...@gmail.com>
> > Para: user@nutch.apache.org
> > Enviados: Jueves, 7 de Febrero 2013 16:32:33
> > Asunto: Re: Could not find any valid local directory for output/file.out
> >
> > On Thu, Feb 7, 2013 at 12:47 PM, Eyeris Rodriguez Rueda <erueda@uci.cu
> > >wrote:
> >
> > > Thank to all for your replies.
> > > If i want to change the default location for hadoop job(/tmp), where i
> > can
> > > do that ?, because my nutch-site.xml not include nothing pointing to
> > /tmp.
> > >
> > Add this property to nutch-site.xml with appropriate value:
> >
> > <property>
> > <name>hadoop.tmp.dir</name>
> > <value>XXXXXXXXXX</value>
> > </property>
> >
> >
> >
> > > So I have readed about nutch and hadoop but im not sure to understand at
> > > all. Is posible to use nutch 1.5.1 in distributed mode ?
> >
> > yes
> >
> >
> > > In this case what i need to do for that, I really appreciated your answer
> > > because I can´t find a good documentation for this topic.
> > >
> > For distributed mode, Nutch is called from runtime/deploy. The conf files
> > should be modified in runtime/local/conf, not in $NUTCH_HOME/conf.
> > So modify the runtime/local/conf/nutch-site.xml to set
> > http.agent.nameproperly.  I am assuming that the hadoop setup is in
> > place and hadoop
> > variables are exported. Now, run the nutch commands from runtime/deploy.
> >
> > Thanks,
> > Tejas Patil
> >
> > >
> > >
> > >
> > > ----- Mensaje original -----
> > > De: "Tejas Patil" <te...@gmail.com>
> > > Para: user@nutch.apache.org
> > > Enviados: Jueves, 7 de Febrero 2013 14:04:26
> > > Asunto: Re: Could not find any valid local directory for output/file.out
> > >
> > > Nutch jobs are executed by Hadoop. "/tmp" is the default location used by
> > > hadoop to store temporary data required for a job. If you dont over-ride
> > > hadoop.tmp.dir in any config file, it will use /tmp by default. In your
> > > case, /tmp doesnt have ample space left so better over-ride that property
> > > and point it to some other location which has ample space.
> > >
> > > Thanks,
> > > Tejas Patil
> > >
> > >
> > > On Thu, Feb 7, 2013 at 10:38 AM, Eyeris Rodriguez Rueda <erueda@uci.cu
> > > >wrote:
> > >
> > > > Thanks lewis by your answer.
> > > > My doubt is why /tmp is increasing while crawl process is doing, and
> > why
> > > > nutch use that folder. Im using nutch 1.5.1 in single mode and my nutch
> > > > site not have properties hadoop.tmp.dir. I need reduce the space used
> > for
> > > > that folder because I only have 40 GB for nutch machine and 50 GB for
> > > solr
> > > > machine. Please some advice or explanation will be accepted.
> > > > Thanks for your time.
> > > >
> > > >
> > > >
> > > > ----- Mensaje original -----
> > > > De: "Lewis John Mcgibbney" <le...@gmail.com>
> > > > Para: user@nutch.apache.org
> > > > Enviados: Jueves, 7 de Febrero 2013 13:06:11
> > > > Asunto: Re: Could not find any valid local directory for
> > output/file.out
> > > >
> > > > Hi,
> > > >
> > > >
> > > >
> > >
> > https://wiki.apache.org/nutch/NutchGotchas#DiskErrorException_while_fetching
> > > >
> > > > On Thursday, February 7, 2013, Eyeris Rodriguez Rueda <er...@uci.cu>
> > > > wrote:
> > > > > Hi all.
> > > > > I have a problem when i do a crawl for few hour or days, im using
> > nutch
> > > > 1.5.1 and solr 3.6, but the crawl process fails and i dont know how to
> > > fix
> > > > this problem, im intersted in make a crawl process without limit with
> > 10
> > > > cicles or more but i have problem with space on hard disk, i have
> > > detected
> > > > that /etc/tmp have 29 GB used and is not good for me, any body can help
> > > me
> > > > or give some advices for configure nutch to make at least one crawl
> > > process
> > > > without problems ?
> > > > >
> > > > > here some features of my environment
> > > > > Ram 2 GB
> > > > > CPU:QuadCore(but im using only 2 cores)
> > > > > Hard Disk:40 GB
> > > > > Threads:50
> > > > > db.fetch.interval.default=2 days
> > > > >
> > > > >
> > > > >
> > > > > this is a part of my log file when nutch fails:
> > > > >
> > > > > ****************************************************************
> > > > > 2013-02-06 18:45:25,961 INFO  fetcher.Fetcher - fetching
> > > > http://bibliodoc.uci.cu/TD/TD_03349_10.pdf
> > > > > 2013-02-06 18:45:25,964 INFO  fetcher.Fetcher - fetching
> > > > http://bibliodoc.uci.cu/TD/TD_0442_07.pdf
> > > > > 2013-02-06 18:45:25,977 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=49
> > > > > 2013-02-06 18:45:26,109 INFO  fetcher.Fetcher - -activeThreads=49,
> > > > spinWaiting=39, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:26,180 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=48
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=47
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=46
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=44
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=45
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=40
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=39
> > > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=38
> > > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=37
> > > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=36
> > > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=35
> > > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=34
> > > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=33
> > > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=32
> > > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=31
> > > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=30
> > > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=29
> > > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=28
> > > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=27
> > > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=26
> > > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=25
> > > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=24
> > > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=23
> > > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=22
> > > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=21
> > > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=20
> > > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=19
> > > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=18
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=41
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=17
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=15
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=13
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=12
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=42
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=43
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=9
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=10
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=11
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=14
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=16
> > > > > 2013-02-06 18:45:26,404 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=8
> > > > > 2013-02-06 18:45:26,630 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=7
> > > > > 2013-02-06 18:45:27,069 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=6
> > > > > 2013-02-06 18:45:27,110 INFO  fetcher.Fetcher - -activeThreads=6,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:27,129 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=5
> > > > > 2013-02-06 18:45:28,110 INFO  fetcher.Fetcher - -activeThreads=5,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:28,502 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=4
> > > > > 2013-02-06 18:45:29,111 INFO  fetcher.Fetcher - -activeThreads=4,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:30,123 INFO  fetcher.Fetcher - -activeThreads=4,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:31,127 INFO  fetcher.Fetcher - -activeThreads=4,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:31,187 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=3
> > > > > 2013-02-06 18:45:32,171 INFO  fetcher.Fetcher - -activeThreads=3,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:32,206 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=2
> > > > > 2013-02-06 18:45:33,173 INFO  fetcher.Fetcher - -activeThreads=2,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:34,173 INFO  fetcher.Fetcher - -activeThreads=2,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:34,205 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=1
> > > > > 2013-02-06 18:45:34,457 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=0
> > > > > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0
> > > > > 2013-02-06 18:45:35,742 WARN  mapred.LocalJobRunner - job_local_0015
> > > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > > any
> > > > valid local directory for output/file.out
> > > > >         at
> > > >
> > > >
> > >
> > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
> > > > >         at
> > > >
> > > >
> > >
> > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
> > > > >         at
> > > >
> > > >
> > >
> > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
> > > > >         at
> > > >
> > > >
> > >
> > org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
> > > > >         at
> > > >
> > > >
> > >
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
> > > > >         at
> > > >
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
> > > > >         at
> > > > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
> > > > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> > > > >         at
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> > > > >
> > > >
> > > > --
> > > > *Lewis*
> > > >
> > >
> >
> 

RE: Could not find any valid local directory for output/file.out

Posted by Markus Jelsma <ma...@openindex.io>.
Hadoop stores temporary files there such as shuffling map output data, you need it! But you can rf -r it after a complete crawl cycle. Do not clear it while a job is running, it's going to miss it's temp files.
 
-----Original message-----
> From:Eyeris Rodriguez Rueda <er...@uci.cu>
> Sent: Fri 08-Feb-2013 20:53
> To: user@nutch.apache.org
> Subject: Re: Could not find any valid local directory for output/file.out
> 
> Im using ubuntu server 12.04 only for nutch, I have asigned 40 GB for this. Is /tmp needed for nutch crawl process ? or i can make a crontab for delete /tmp content without problem for nutch crawl.
> 
> 
> 
> 
> ----- Mensaje original -----
> De: "Tejas Patil" <te...@gmail.com>
> Para: user@nutch.apache.org
> Enviados: Viernes, 8 de Febrero 2013 14:33:25
> Asunto: Re: Could not find any valid local directory for output/file.out
> 
> I dont think there is any such property. Maybe its time for you to cleanup
> /tmp :)
> 
> Thanks,
> Tejas Patil
> 
> 
> On Fri, Feb 8, 2013 at 11:16 AM, Eyeris Rodriguez Rueda <er...@uci.cu>wrote:
> 
> > Hi lewis an tejas again.
> > I have point the hadoop.tmp.dir property but nutch still consuming to much
> > space for me.
> > Is posible to reduce the space of nutch in my tmp folder with some
> > property of a fetcher process? I always get an exception because the hard
> > disk is full. my crawldb only have 150 MB not more. but my tmp folder
> > continue increasing without control until 60 GB, and fail at this point.
> > please any help
> >
> >
> >
> >
> > ----- Mensaje original -----
> > De: "Eyeris Rodriguez Rueda" <er...@uci.cu>
> > Para: user@nutch.apache.org
> > Enviados: Viernes, 8 de Febrero 2013 10:45:52
> > Asunto: Re: Could not find any valid local directory for output/file.out
> >
> > Thanks a lot. lewis and tejas, you are very helpfull for me.
> > It function ok, I have pointed to another partition and ok.
> > Problem solved.
> >
> >
> >
> >
> >
> > ----- Mensaje original -----
> > De: "Tejas Patil" <te...@gmail.com>
> > Para: user@nutch.apache.org
> > Enviados: Jueves, 7 de Febrero 2013 16:32:33
> > Asunto: Re: Could not find any valid local directory for output/file.out
> >
> > On Thu, Feb 7, 2013 at 12:47 PM, Eyeris Rodriguez Rueda <erueda@uci.cu
> > >wrote:
> >
> > > Thank to all for your replies.
> > > If i want to change the default location for hadoop job(/tmp), where i
> > can
> > > do that ?, because my nutch-site.xml not include nothing pointing to
> > /tmp.
> > >
> > Add this property to nutch-site.xml with appropriate value:
> >
> > <property>
> > <name>hadoop.tmp.dir</name>
> > <value>XXXXXXXXXX</value>
> > </property>
> >
> >
> >
> > > So I have readed about nutch and hadoop but im not sure to understand at
> > > all. Is posible to use nutch 1.5.1 in distributed mode ?
> >
> > yes
> >
> >
> > > In this case what i need to do for that, I really appreciated your answer
> > > because I can´t find a good documentation for this topic.
> > >
> > For distributed mode, Nutch is called from runtime/deploy. The conf files
> > should be modified in runtime/local/conf, not in $NUTCH_HOME/conf.
> > So modify the runtime/local/conf/nutch-site.xml to set
> > http.agent.nameproperly.  I am assuming that the hadoop setup is in
> > place and hadoop
> > variables are exported. Now, run the nutch commands from runtime/deploy.
> >
> > Thanks,
> > Tejas Patil
> >
> > >
> > >
> > >
> > > ----- Mensaje original -----
> > > De: "Tejas Patil" <te...@gmail.com>
> > > Para: user@nutch.apache.org
> > > Enviados: Jueves, 7 de Febrero 2013 14:04:26
> > > Asunto: Re: Could not find any valid local directory for output/file.out
> > >
> > > Nutch jobs are executed by Hadoop. "/tmp" is the default location used by
> > > hadoop to store temporary data required for a job. If you dont over-ride
> > > hadoop.tmp.dir in any config file, it will use /tmp by default. In your
> > > case, /tmp doesnt have ample space left so better over-ride that property
> > > and point it to some other location which has ample space.
> > >
> > > Thanks,
> > > Tejas Patil
> > >
> > >
> > > On Thu, Feb 7, 2013 at 10:38 AM, Eyeris Rodriguez Rueda <erueda@uci.cu
> > > >wrote:
> > >
> > > > Thanks lewis by your answer.
> > > > My doubt is why /tmp is increasing while crawl process is doing, and
> > why
> > > > nutch use that folder. Im using nutch 1.5.1 in single mode and my nutch
> > > > site not have properties hadoop.tmp.dir. I need reduce the space used
> > for
> > > > that folder because I only have 40 GB for nutch machine and 50 GB for
> > > solr
> > > > machine. Please some advice or explanation will be accepted.
> > > > Thanks for your time.
> > > >
> > > >
> > > >
> > > > ----- Mensaje original -----
> > > > De: "Lewis John Mcgibbney" <le...@gmail.com>
> > > > Para: user@nutch.apache.org
> > > > Enviados: Jueves, 7 de Febrero 2013 13:06:11
> > > > Asunto: Re: Could not find any valid local directory for
> > output/file.out
> > > >
> > > > Hi,
> > > >
> > > >
> > > >
> > >
> > https://wiki.apache.org/nutch/NutchGotchas#DiskErrorException_while_fetching
> > > >
> > > > On Thursday, February 7, 2013, Eyeris Rodriguez Rueda <er...@uci.cu>
> > > > wrote:
> > > > > Hi all.
> > > > > I have a problem when i do a crawl for few hour or days, im using
> > nutch
> > > > 1.5.1 and solr 3.6, but the crawl process fails and i dont know how to
> > > fix
> > > > this problem, im intersted in make a crawl process without limit with
> > 10
> > > > cicles or more but i have problem with space on hard disk, i have
> > > detected
> > > > that /etc/tmp have 29 GB used and is not good for me, any body can help
> > > me
> > > > or give some advices for configure nutch to make at least one crawl
> > > process
> > > > without problems ?
> > > > >
> > > > > here some features of my environment
> > > > > Ram 2 GB
> > > > > CPU:QuadCore(but im using only 2 cores)
> > > > > Hard Disk:40 GB
> > > > > Threads:50
> > > > > db.fetch.interval.default=2 days
> > > > >
> > > > >
> > > > >
> > > > > this is a part of my log file when nutch fails:
> > > > >
> > > > > ****************************************************************
> > > > > 2013-02-06 18:45:25,961 INFO  fetcher.Fetcher - fetching
> > > > http://bibliodoc.uci.cu/TD/TD_03349_10.pdf
> > > > > 2013-02-06 18:45:25,964 INFO  fetcher.Fetcher - fetching
> > > > http://bibliodoc.uci.cu/TD/TD_0442_07.pdf
> > > > > 2013-02-06 18:45:25,977 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=49
> > > > > 2013-02-06 18:45:26,109 INFO  fetcher.Fetcher - -activeThreads=49,
> > > > spinWaiting=39, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:26,180 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=48
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=47
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=46
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=44
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=45
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=40
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=39
> > > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=38
> > > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=37
> > > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=36
> > > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=35
> > > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=34
> > > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=33
> > > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=32
> > > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=31
> > > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=30
> > > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=29
> > > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=28
> > > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=27
> > > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=26
> > > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=25
> > > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=24
> > > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=23
> > > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=22
> > > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=21
> > > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=20
> > > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=19
> > > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=18
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=41
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=17
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=15
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=13
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=12
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=42
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=43
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=9
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=10
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=11
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=14
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=16
> > > > > 2013-02-06 18:45:26,404 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=8
> > > > > 2013-02-06 18:45:26,630 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=7
> > > > > 2013-02-06 18:45:27,069 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=6
> > > > > 2013-02-06 18:45:27,110 INFO  fetcher.Fetcher - -activeThreads=6,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:27,129 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=5
> > > > > 2013-02-06 18:45:28,110 INFO  fetcher.Fetcher - -activeThreads=5,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:28,502 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=4
> > > > > 2013-02-06 18:45:29,111 INFO  fetcher.Fetcher - -activeThreads=4,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:30,123 INFO  fetcher.Fetcher - -activeThreads=4,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:31,127 INFO  fetcher.Fetcher - -activeThreads=4,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:31,187 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=3
> > > > > 2013-02-06 18:45:32,171 INFO  fetcher.Fetcher - -activeThreads=3,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:32,206 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=2
> > > > > 2013-02-06 18:45:33,173 INFO  fetcher.Fetcher - -activeThreads=2,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:34,173 INFO  fetcher.Fetcher - -activeThreads=2,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:34,205 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=1
> > > > > 2013-02-06 18:45:34,457 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=0
> > > > > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0
> > > > > 2013-02-06 18:45:35,742 WARN  mapred.LocalJobRunner - job_local_0015
> > > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > > any
> > > > valid local directory for output/file.out
> > > > >         at
> > > >
> > > >
> > >
> > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
> > > > >         at
> > > >
> > > >
> > >
> > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
> > > > >         at
> > > >
> > > >
> > >
> > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
> > > > >         at
> > > >
> > > >
> > >
> > org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
> > > > >         at
> > > >
> > > >
> > >
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
> > > > >         at
> > > >
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
> > > > >         at
> > > > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
> > > > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> > > > >         at
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> > > > >
> > > >
> > > > --
> > > > *Lewis*
> > > >
> > >
> >
> 

Re: Could not find any valid local directory for output/file.out

Posted by Eyeris Rodriguez Rueda <er...@uci.cu>.
Im using ubuntu server 12.04 only for nutch, I have asigned 40 GB for this. Is /tmp needed for nutch crawl process ? or i can make a crontab for delete /tmp content without problem for nutch crawl.




----- Mensaje original -----
De: "Tejas Patil" <te...@gmail.com>
Para: user@nutch.apache.org
Enviados: Viernes, 8 de Febrero 2013 14:33:25
Asunto: Re: Could not find any valid local directory for output/file.out

I dont think there is any such property. Maybe its time for you to cleanup
/tmp :)

Thanks,
Tejas Patil


On Fri, Feb 8, 2013 at 11:16 AM, Eyeris Rodriguez Rueda <er...@uci.cu>wrote:

> Hi lewis an tejas again.
> I have point the hadoop.tmp.dir property but nutch still consuming to much
> space for me.
> Is posible to reduce the space of nutch in my tmp folder with some
> property of a fetcher process? I always get an exception because the hard
> disk is full. my crawldb only have 150 MB not more. but my tmp folder
> continue increasing without control until 60 GB, and fail at this point.
> please any help
>
>
>
>
> ----- Mensaje original -----
> De: "Eyeris Rodriguez Rueda" <er...@uci.cu>
> Para: user@nutch.apache.org
> Enviados: Viernes, 8 de Febrero 2013 10:45:52
> Asunto: Re: Could not find any valid local directory for output/file.out
>
> Thanks a lot. lewis and tejas, you are very helpfull for me.
> It function ok, I have pointed to another partition and ok.
> Problem solved.
>
>
>
>
>
> ----- Mensaje original -----
> De: "Tejas Patil" <te...@gmail.com>
> Para: user@nutch.apache.org
> Enviados: Jueves, 7 de Febrero 2013 16:32:33
> Asunto: Re: Could not find any valid local directory for output/file.out
>
> On Thu, Feb 7, 2013 at 12:47 PM, Eyeris Rodriguez Rueda <erueda@uci.cu
> >wrote:
>
> > Thank to all for your replies.
> > If i want to change the default location for hadoop job(/tmp), where i
> can
> > do that ?, because my nutch-site.xml not include nothing pointing to
> /tmp.
> >
> Add this property to nutch-site.xml with appropriate value:
>
> <property>
> <name>hadoop.tmp.dir</name>
> <value>XXXXXXXXXX</value>
> </property>
>
>
>
> > So I have readed about nutch and hadoop but im not sure to understand at
> > all. Is posible to use nutch 1.5.1 in distributed mode ?
>
> yes
>
>
> > In this case what i need to do for that, I really appreciated your answer
> > because I can´t find a good documentation for this topic.
> >
> For distributed mode, Nutch is called from runtime/deploy. The conf files
> should be modified in runtime/local/conf, not in $NUTCH_HOME/conf.
> So modify the runtime/local/conf/nutch-site.xml to set
> http.agent.nameproperly.  I am assuming that the hadoop setup is in
> place and hadoop
> variables are exported. Now, run the nutch commands from runtime/deploy.
>
> Thanks,
> Tejas Patil
>
> >
> >
> >
> > ----- Mensaje original -----
> > De: "Tejas Patil" <te...@gmail.com>
> > Para: user@nutch.apache.org
> > Enviados: Jueves, 7 de Febrero 2013 14:04:26
> > Asunto: Re: Could not find any valid local directory for output/file.out
> >
> > Nutch jobs are executed by Hadoop. "/tmp" is the default location used by
> > hadoop to store temporary data required for a job. If you dont over-ride
> > hadoop.tmp.dir in any config file, it will use /tmp by default. In your
> > case, /tmp doesnt have ample space left so better over-ride that property
> > and point it to some other location which has ample space.
> >
> > Thanks,
> > Tejas Patil
> >
> >
> > On Thu, Feb 7, 2013 at 10:38 AM, Eyeris Rodriguez Rueda <erueda@uci.cu
> > >wrote:
> >
> > > Thanks lewis by your answer.
> > > My doubt is why /tmp is increasing while crawl process is doing, and
> why
> > > nutch use that folder. Im using nutch 1.5.1 in single mode and my nutch
> > > site not have properties hadoop.tmp.dir. I need reduce the space used
> for
> > > that folder because I only have 40 GB for nutch machine and 50 GB for
> > solr
> > > machine. Please some advice or explanation will be accepted.
> > > Thanks for your time.
> > >
> > >
> > >
> > > ----- Mensaje original -----
> > > De: "Lewis John Mcgibbney" <le...@gmail.com>
> > > Para: user@nutch.apache.org
> > > Enviados: Jueves, 7 de Febrero 2013 13:06:11
> > > Asunto: Re: Could not find any valid local directory for
> output/file.out
> > >
> > > Hi,
> > >
> > >
> > >
> >
> https://wiki.apache.org/nutch/NutchGotchas#DiskErrorException_while_fetching
> > >
> > > On Thursday, February 7, 2013, Eyeris Rodriguez Rueda <er...@uci.cu>
> > > wrote:
> > > > Hi all.
> > > > I have a problem when i do a crawl for few hour or days, im using
> nutch
> > > 1.5.1 and solr 3.6, but the crawl process fails and i dont know how to
> > fix
> > > this problem, im intersted in make a crawl process without limit with
> 10
> > > cicles or more but i have problem with space on hard disk, i have
> > detected
> > > that /etc/tmp have 29 GB used and is not good for me, any body can help
> > me
> > > or give some advices for configure nutch to make at least one crawl
> > process
> > > without problems ?
> > > >
> > > > here some features of my environment
> > > > Ram 2 GB
> > > > CPU:QuadCore(but im using only 2 cores)
> > > > Hard Disk:40 GB
> > > > Threads:50
> > > > db.fetch.interval.default=2 days
> > > >
> > > >
> > > >
> > > > this is a part of my log file when nutch fails:
> > > >
> > > > ****************************************************************
> > > > 2013-02-06 18:45:25,961 INFO  fetcher.Fetcher - fetching
> > > http://bibliodoc.uci.cu/TD/TD_03349_10.pdf
> > > > 2013-02-06 18:45:25,964 INFO  fetcher.Fetcher - fetching
> > > http://bibliodoc.uci.cu/TD/TD_0442_07.pdf
> > > > 2013-02-06 18:45:25,977 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=49
> > > > 2013-02-06 18:45:26,109 INFO  fetcher.Fetcher - -activeThreads=49,
> > > spinWaiting=39, fetchQueues.totalSize=0
> > > > 2013-02-06 18:45:26,180 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=48
> > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=47
> > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=46
> > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=44
> > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=45
> > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=40
> > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=39
> > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=38
> > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=37
> > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=36
> > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=35
> > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=34
> > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=33
> > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=32
> > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=31
> > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=30
> > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=29
> > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=28
> > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=27
> > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=26
> > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=25
> > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=24
> > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=23
> > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=22
> > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=21
> > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=20
> > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=19
> > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=18
> > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=41
> > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=17
> > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=15
> > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=13
> > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=12
> > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=42
> > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=43
> > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=9
> > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=10
> > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=11
> > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=14
> > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=16
> > > > 2013-02-06 18:45:26,404 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=8
> > > > 2013-02-06 18:45:26,630 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=7
> > > > 2013-02-06 18:45:27,069 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=6
> > > > 2013-02-06 18:45:27,110 INFO  fetcher.Fetcher - -activeThreads=6,
> > > spinWaiting=0, fetchQueues.totalSize=0
> > > > 2013-02-06 18:45:27,129 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=5
> > > > 2013-02-06 18:45:28,110 INFO  fetcher.Fetcher - -activeThreads=5,
> > > spinWaiting=0, fetchQueues.totalSize=0
> > > > 2013-02-06 18:45:28,502 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=4
> > > > 2013-02-06 18:45:29,111 INFO  fetcher.Fetcher - -activeThreads=4,
> > > spinWaiting=0, fetchQueues.totalSize=0
> > > > 2013-02-06 18:45:30,123 INFO  fetcher.Fetcher - -activeThreads=4,
> > > spinWaiting=0, fetchQueues.totalSize=0
> > > > 2013-02-06 18:45:31,127 INFO  fetcher.Fetcher - -activeThreads=4,
> > > spinWaiting=0, fetchQueues.totalSize=0
> > > > 2013-02-06 18:45:31,187 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=3
> > > > 2013-02-06 18:45:32,171 INFO  fetcher.Fetcher - -activeThreads=3,
> > > spinWaiting=0, fetchQueues.totalSize=0
> > > > 2013-02-06 18:45:32,206 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=2
> > > > 2013-02-06 18:45:33,173 INFO  fetcher.Fetcher - -activeThreads=2,
> > > spinWaiting=0, fetchQueues.totalSize=0
> > > > 2013-02-06 18:45:34,173 INFO  fetcher.Fetcher - -activeThreads=2,
> > > spinWaiting=0, fetchQueues.totalSize=0
> > > > 2013-02-06 18:45:34,205 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=1
> > > > 2013-02-06 18:45:34,457 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=0
> > > > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0,
> > > spinWaiting=0, fetchQueues.totalSize=0
> > > > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0
> > > > 2013-02-06 18:45:35,742 WARN  mapred.LocalJobRunner - job_local_0015
> > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > any
> > > valid local directory for output/file.out
> > > >         at
> > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
> > > >         at
> > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
> > > >         at
> > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
> > > >         at
> > >
> > >
> >
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
> > > >         at
> > >
> > >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
> > > >         at
> > >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
> > > >         at
> > > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
> > > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> > > >         at
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> > > >
> > >
> > > --
> > > *Lewis*
> > >
> >
>

Re: Could not find any valid local directory for output/file.out

Posted by Tejas Patil <te...@gmail.com>.
I dont think there is any such property. Maybe its time for you to cleanup
/tmp :)

Thanks,
Tejas Patil


On Fri, Feb 8, 2013 at 11:16 AM, Eyeris Rodriguez Rueda <er...@uci.cu>wrote:

> Hi lewis an tejas again.
> I have point the hadoop.tmp.dir property but nutch still consuming to much
> space for me.
> Is posible to reduce the space of nutch in my tmp folder with some
> property of a fetcher process? I always get an exception because the hard
> disk is full. my crawldb only have 150 MB not more. but my tmp folder
> continue increasing without control until 60 GB, and fail at this point.
> please any help
>
>
>
>
> ----- Mensaje original -----
> De: "Eyeris Rodriguez Rueda" <er...@uci.cu>
> Para: user@nutch.apache.org
> Enviados: Viernes, 8 de Febrero 2013 10:45:52
> Asunto: Re: Could not find any valid local directory for output/file.out
>
> Thanks a lot. lewis and tejas, you are very helpfull for me.
> It function ok, I have pointed to another partition and ok.
> Problem solved.
>
>
>
>
>
> ----- Mensaje original -----
> De: "Tejas Patil" <te...@gmail.com>
> Para: user@nutch.apache.org
> Enviados: Jueves, 7 de Febrero 2013 16:32:33
> Asunto: Re: Could not find any valid local directory for output/file.out
>
> On Thu, Feb 7, 2013 at 12:47 PM, Eyeris Rodriguez Rueda <erueda@uci.cu
> >wrote:
>
> > Thank to all for your replies.
> > If i want to change the default location for hadoop job(/tmp), where i
> can
> > do that ?, because my nutch-site.xml not include nothing pointing to
> /tmp.
> >
> Add this property to nutch-site.xml with appropriate value:
>
> <property>
> <name>hadoop.tmp.dir</name>
> <value>XXXXXXXXXX</value>
> </property>
>
>
>
> > So I have readed about nutch and hadoop but im not sure to understand at
> > all. Is posible to use nutch 1.5.1 in distributed mode ?
>
> yes
>
>
> > In this case what i need to do for that, I really appreciated your answer
> > because I can´t find a good documentation for this topic.
> >
> For distributed mode, Nutch is called from runtime/deploy. The conf files
> should be modified in runtime/local/conf, not in $NUTCH_HOME/conf.
> So modify the runtime/local/conf/nutch-site.xml to set
> http.agent.nameproperly.  I am assuming that the hadoop setup is in
> place and hadoop
> variables are exported. Now, run the nutch commands from runtime/deploy.
>
> Thanks,
> Tejas Patil
>
> >
> >
> >
> > ----- Mensaje original -----
> > De: "Tejas Patil" <te...@gmail.com>
> > Para: user@nutch.apache.org
> > Enviados: Jueves, 7 de Febrero 2013 14:04:26
> > Asunto: Re: Could not find any valid local directory for output/file.out
> >
> > Nutch jobs are executed by Hadoop. "/tmp" is the default location used by
> > hadoop to store temporary data required for a job. If you dont over-ride
> > hadoop.tmp.dir in any config file, it will use /tmp by default. In your
> > case, /tmp doesnt have ample space left so better over-ride that property
> > and point it to some other location which has ample space.
> >
> > Thanks,
> > Tejas Patil
> >
> >
> > On Thu, Feb 7, 2013 at 10:38 AM, Eyeris Rodriguez Rueda <erueda@uci.cu
> > >wrote:
> >
> > > Thanks lewis by your answer.
> > > My doubt is why /tmp is increasing while crawl process is doing, and
> why
> > > nutch use that folder. Im using nutch 1.5.1 in single mode and my nutch
> > > site not have properties hadoop.tmp.dir. I need reduce the space used
> for
> > > that folder because I only have 40 GB for nutch machine and 50 GB for
> > solr
> > > machine. Please some advice or explanation will be accepted.
> > > Thanks for your time.
> > >
> > >
> > >
> > > ----- Mensaje original -----
> > > De: "Lewis John Mcgibbney" <le...@gmail.com>
> > > Para: user@nutch.apache.org
> > > Enviados: Jueves, 7 de Febrero 2013 13:06:11
> > > Asunto: Re: Could not find any valid local directory for
> output/file.out
> > >
> > > Hi,
> > >
> > >
> > >
> >
> https://wiki.apache.org/nutch/NutchGotchas#DiskErrorException_while_fetching
> > >
> > > On Thursday, February 7, 2013, Eyeris Rodriguez Rueda <er...@uci.cu>
> > > wrote:
> > > > Hi all.
> > > > I have a problem when i do a crawl for few hour or days, im using
> nutch
> > > 1.5.1 and solr 3.6, but the crawl process fails and i dont know how to
> > fix
> > > this problem, im intersted in make a crawl process without limit with
> 10
> > > cicles or more but i have problem with space on hard disk, i have
> > detected
> > > that /etc/tmp have 29 GB used and is not good for me, any body can help
> > me
> > > or give some advices for configure nutch to make at least one crawl
> > process
> > > without problems ?
> > > >
> > > > here some features of my environment
> > > > Ram 2 GB
> > > > CPU:QuadCore(but im using only 2 cores)
> > > > Hard Disk:40 GB
> > > > Threads:50
> > > > db.fetch.interval.default=2 days
> > > >
> > > >
> > > >
> > > > this is a part of my log file when nutch fails:
> > > >
> > > > ****************************************************************
> > > > 2013-02-06 18:45:25,961 INFO  fetcher.Fetcher - fetching
> > > http://bibliodoc.uci.cu/TD/TD_03349_10.pdf
> > > > 2013-02-06 18:45:25,964 INFO  fetcher.Fetcher - fetching
> > > http://bibliodoc.uci.cu/TD/TD_0442_07.pdf
> > > > 2013-02-06 18:45:25,977 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=49
> > > > 2013-02-06 18:45:26,109 INFO  fetcher.Fetcher - -activeThreads=49,
> > > spinWaiting=39, fetchQueues.totalSize=0
> > > > 2013-02-06 18:45:26,180 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=48
> > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=47
> > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=46
> > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=44
> > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=45
> > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=40
> > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=39
> > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=38
> > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=37
> > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=36
> > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=35
> > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=34
> > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=33
> > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=32
> > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=31
> > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=30
> > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=29
> > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=28
> > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=27
> > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=26
> > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=25
> > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=24
> > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=23
> > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=22
> > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=21
> > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=20
> > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=19
> > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=18
> > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=41
> > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=17
> > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=15
> > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=13
> > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=12
> > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=42
> > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=43
> > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=9
> > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=10
> > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=11
> > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=14
> > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=16
> > > > 2013-02-06 18:45:26,404 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=8
> > > > 2013-02-06 18:45:26,630 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=7
> > > > 2013-02-06 18:45:27,069 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=6
> > > > 2013-02-06 18:45:27,110 INFO  fetcher.Fetcher - -activeThreads=6,
> > > spinWaiting=0, fetchQueues.totalSize=0
> > > > 2013-02-06 18:45:27,129 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=5
> > > > 2013-02-06 18:45:28,110 INFO  fetcher.Fetcher - -activeThreads=5,
> > > spinWaiting=0, fetchQueues.totalSize=0
> > > > 2013-02-06 18:45:28,502 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=4
> > > > 2013-02-06 18:45:29,111 INFO  fetcher.Fetcher - -activeThreads=4,
> > > spinWaiting=0, fetchQueues.totalSize=0
> > > > 2013-02-06 18:45:30,123 INFO  fetcher.Fetcher - -activeThreads=4,
> > > spinWaiting=0, fetchQueues.totalSize=0
> > > > 2013-02-06 18:45:31,127 INFO  fetcher.Fetcher - -activeThreads=4,
> > > spinWaiting=0, fetchQueues.totalSize=0
> > > > 2013-02-06 18:45:31,187 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=3
> > > > 2013-02-06 18:45:32,171 INFO  fetcher.Fetcher - -activeThreads=3,
> > > spinWaiting=0, fetchQueues.totalSize=0
> > > > 2013-02-06 18:45:32,206 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=2
> > > > 2013-02-06 18:45:33,173 INFO  fetcher.Fetcher - -activeThreads=2,
> > > spinWaiting=0, fetchQueues.totalSize=0
> > > > 2013-02-06 18:45:34,173 INFO  fetcher.Fetcher - -activeThreads=2,
> > > spinWaiting=0, fetchQueues.totalSize=0
> > > > 2013-02-06 18:45:34,205 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=1
> > > > 2013-02-06 18:45:34,457 INFO  fetcher.Fetcher - -finishing thread
> > > FetcherThread, activeThreads=0
> > > > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0,
> > > spinWaiting=0, fetchQueues.totalSize=0
> > > > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0
> > > > 2013-02-06 18:45:35,742 WARN  mapred.LocalJobRunner - job_local_0015
> > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > any
> > > valid local directory for output/file.out
> > > >         at
> > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
> > > >         at
> > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
> > > >         at
> > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
> > > >         at
> > >
> > >
> >
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
> > > >         at
> > >
> > >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
> > > >         at
> > >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
> > > >         at
> > > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
> > > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> > > >         at
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> > > >
> > >
> > > --
> > > *Lewis*
> > >
> >
>

Re: Could not find any valid local directory for output/file.out

Posted by Eyeris Rodriguez Rueda <er...@uci.cu>.
Hi lewis an tejas again.
I have point the hadoop.tmp.dir property but nutch still consuming to much space for me.
Is posible to reduce the space of nutch in my tmp folder with some property of a fetcher process? I always get an exception because the hard disk is full. my crawldb only have 150 MB not more. but my tmp folder continue increasing without control until 60 GB, and fail at this point.
please any help




----- Mensaje original -----
De: "Eyeris Rodriguez Rueda" <er...@uci.cu>
Para: user@nutch.apache.org
Enviados: Viernes, 8 de Febrero 2013 10:45:52
Asunto: Re: Could not find any valid local directory for output/file.out

Thanks a lot. lewis and tejas, you are very helpfull for me.
It function ok, I have pointed to another partition and ok.
Problem solved.





----- Mensaje original -----
De: "Tejas Patil" <te...@gmail.com>
Para: user@nutch.apache.org
Enviados: Jueves, 7 de Febrero 2013 16:32:33
Asunto: Re: Could not find any valid local directory for output/file.out

On Thu, Feb 7, 2013 at 12:47 PM, Eyeris Rodriguez Rueda <er...@uci.cu>wrote:

> Thank to all for your replies.
> If i want to change the default location for hadoop job(/tmp), where i can
> do that ?, because my nutch-site.xml not include nothing pointing to /tmp.
>
Add this property to nutch-site.xml with appropriate value:

<property>
<name>hadoop.tmp.dir</name>
<value>XXXXXXXXXX</value>
</property>



> So I have readed about nutch and hadoop but im not sure to understand at
> all. Is posible to use nutch 1.5.1 in distributed mode ?

yes


> In this case what i need to do for that, I really appreciated your answer
> because I can´t find a good documentation for this topic.
>
For distributed mode, Nutch is called from runtime/deploy. The conf files
should be modified in runtime/local/conf, not in $NUTCH_HOME/conf.
So modify the runtime/local/conf/nutch-site.xml to set
http.agent.nameproperly.  I am assuming that the hadoop setup is in
place and hadoop
variables are exported. Now, run the nutch commands from runtime/deploy.

Thanks,
Tejas Patil

>
>
>
> ----- Mensaje original -----
> De: "Tejas Patil" <te...@gmail.com>
> Para: user@nutch.apache.org
> Enviados: Jueves, 7 de Febrero 2013 14:04:26
> Asunto: Re: Could not find any valid local directory for output/file.out
>
> Nutch jobs are executed by Hadoop. "/tmp" is the default location used by
> hadoop to store temporary data required for a job. If you dont over-ride
> hadoop.tmp.dir in any config file, it will use /tmp by default. In your
> case, /tmp doesnt have ample space left so better over-ride that property
> and point it to some other location which has ample space.
>
> Thanks,
> Tejas Patil
>
>
> On Thu, Feb 7, 2013 at 10:38 AM, Eyeris Rodriguez Rueda <erueda@uci.cu
> >wrote:
>
> > Thanks lewis by your answer.
> > My doubt is why /tmp is increasing while crawl process is doing, and why
> > nutch use that folder. Im using nutch 1.5.1 in single mode and my nutch
> > site not have properties hadoop.tmp.dir. I need reduce the space used for
> > that folder because I only have 40 GB for nutch machine and 50 GB for
> solr
> > machine. Please some advice or explanation will be accepted.
> > Thanks for your time.
> >
> >
> >
> > ----- Mensaje original -----
> > De: "Lewis John Mcgibbney" <le...@gmail.com>
> > Para: user@nutch.apache.org
> > Enviados: Jueves, 7 de Febrero 2013 13:06:11
> > Asunto: Re: Could not find any valid local directory for output/file.out
> >
> > Hi,
> >
> >
> >
> https://wiki.apache.org/nutch/NutchGotchas#DiskErrorException_while_fetching
> >
> > On Thursday, February 7, 2013, Eyeris Rodriguez Rueda <er...@uci.cu>
> > wrote:
> > > Hi all.
> > > I have a problem when i do a crawl for few hour or days, im using nutch
> > 1.5.1 and solr 3.6, but the crawl process fails and i dont know how to
> fix
> > this problem, im intersted in make a crawl process without limit with 10
> > cicles or more but i have problem with space on hard disk, i have
> detected
> > that /etc/tmp have 29 GB used and is not good for me, any body can help
> me
> > or give some advices for configure nutch to make at least one crawl
> process
> > without problems ?
> > >
> > > here some features of my environment
> > > Ram 2 GB
> > > CPU:QuadCore(but im using only 2 cores)
> > > Hard Disk:40 GB
> > > Threads:50
> > > db.fetch.interval.default=2 days
> > >
> > >
> > >
> > > this is a part of my log file when nutch fails:
> > >
> > > ****************************************************************
> > > 2013-02-06 18:45:25,961 INFO  fetcher.Fetcher - fetching
> > http://bibliodoc.uci.cu/TD/TD_03349_10.pdf
> > > 2013-02-06 18:45:25,964 INFO  fetcher.Fetcher - fetching
> > http://bibliodoc.uci.cu/TD/TD_0442_07.pdf
> > > 2013-02-06 18:45:25,977 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=49
> > > 2013-02-06 18:45:26,109 INFO  fetcher.Fetcher - -activeThreads=49,
> > spinWaiting=39, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:26,180 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=48
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=47
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=46
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=44
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=45
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=40
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=39
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=38
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=37
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=36
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=35
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=34
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=33
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=32
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=31
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=30
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=29
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=28
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=27
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=26
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=25
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=24
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=23
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=22
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=21
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=20
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=19
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=18
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=41
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=17
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=15
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=13
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=12
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=42
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=43
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=9
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=10
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=11
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=14
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=16
> > > 2013-02-06 18:45:26,404 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=8
> > > 2013-02-06 18:45:26,630 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=7
> > > 2013-02-06 18:45:27,069 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=6
> > > 2013-02-06 18:45:27,110 INFO  fetcher.Fetcher - -activeThreads=6,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:27,129 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=5
> > > 2013-02-06 18:45:28,110 INFO  fetcher.Fetcher - -activeThreads=5,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:28,502 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=4
> > > 2013-02-06 18:45:29,111 INFO  fetcher.Fetcher - -activeThreads=4,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:30,123 INFO  fetcher.Fetcher - -activeThreads=4,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:31,127 INFO  fetcher.Fetcher - -activeThreads=4,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:31,187 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=3
> > > 2013-02-06 18:45:32,171 INFO  fetcher.Fetcher - -activeThreads=3,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:32,206 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=2
> > > 2013-02-06 18:45:33,173 INFO  fetcher.Fetcher - -activeThreads=2,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:34,173 INFO  fetcher.Fetcher - -activeThreads=2,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:34,205 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=1
> > > 2013-02-06 18:45:34,457 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=0
> > > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0
> > > 2013-02-06 18:45:35,742 WARN  mapred.LocalJobRunner - job_local_0015
> > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> any
> > valid local directory for output/file.out
> > >         at
> >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
> > >         at
> >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
> > >         at
> >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
> > >         at
> >
> >
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
> > >         at
> >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
> > >         at
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
> > >         at
> > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
> > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> > >         at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> > >
> >
> > --
> > *Lewis*
> >
>

Re: Could not find any valid local directory for output/file.out

Posted by Eyeris Rodriguez Rueda <er...@uci.cu>.
Thanks a lot. lewis and tejas, you are very helpfull for me.
It function ok, I have pointed to another partition and ok.
Problem solved.





----- Mensaje original -----
De: "Tejas Patil" <te...@gmail.com>
Para: user@nutch.apache.org
Enviados: Jueves, 7 de Febrero 2013 16:32:33
Asunto: Re: Could not find any valid local directory for output/file.out

On Thu, Feb 7, 2013 at 12:47 PM, Eyeris Rodriguez Rueda <er...@uci.cu>wrote:

> Thank to all for your replies.
> If i want to change the default location for hadoop job(/tmp), where i can
> do that ?, because my nutch-site.xml not include nothing pointing to /tmp.
>
Add this property to nutch-site.xml with appropriate value:

<property>
<name>hadoop.tmp.dir</name>
<value>XXXXXXXXXX</value>
</property>



> So I have readed about nutch and hadoop but im not sure to understand at
> all. Is posible to use nutch 1.5.1 in distributed mode ?

yes


> In this case what i need to do for that, I really appreciated your answer
> because I can´t find a good documentation for this topic.
>
For distributed mode, Nutch is called from runtime/deploy. The conf files
should be modified in runtime/local/conf, not in $NUTCH_HOME/conf.
So modify the runtime/local/conf/nutch-site.xml to set
http.agent.nameproperly.  I am assuming that the hadoop setup is in
place and hadoop
variables are exported. Now, run the nutch commands from runtime/deploy.

Thanks,
Tejas Patil

>
>
>
> ----- Mensaje original -----
> De: "Tejas Patil" <te...@gmail.com>
> Para: user@nutch.apache.org
> Enviados: Jueves, 7 de Febrero 2013 14:04:26
> Asunto: Re: Could not find any valid local directory for output/file.out
>
> Nutch jobs are executed by Hadoop. "/tmp" is the default location used by
> hadoop to store temporary data required for a job. If you dont over-ride
> hadoop.tmp.dir in any config file, it will use /tmp by default. In your
> case, /tmp doesnt have ample space left so better over-ride that property
> and point it to some other location which has ample space.
>
> Thanks,
> Tejas Patil
>
>
> On Thu, Feb 7, 2013 at 10:38 AM, Eyeris Rodriguez Rueda <erueda@uci.cu
> >wrote:
>
> > Thanks lewis by your answer.
> > My doubt is why /tmp is increasing while crawl process is doing, and why
> > nutch use that folder. Im using nutch 1.5.1 in single mode and my nutch
> > site not have properties hadoop.tmp.dir. I need reduce the space used for
> > that folder because I only have 40 GB for nutch machine and 50 GB for
> solr
> > machine. Please some advice or explanation will be accepted.
> > Thanks for your time.
> >
> >
> >
> > ----- Mensaje original -----
> > De: "Lewis John Mcgibbney" <le...@gmail.com>
> > Para: user@nutch.apache.org
> > Enviados: Jueves, 7 de Febrero 2013 13:06:11
> > Asunto: Re: Could not find any valid local directory for output/file.out
> >
> > Hi,
> >
> >
> >
> https://wiki.apache.org/nutch/NutchGotchas#DiskErrorException_while_fetching
> >
> > On Thursday, February 7, 2013, Eyeris Rodriguez Rueda <er...@uci.cu>
> > wrote:
> > > Hi all.
> > > I have a problem when i do a crawl for few hour or days, im using nutch
> > 1.5.1 and solr 3.6, but the crawl process fails and i dont know how to
> fix
> > this problem, im intersted in make a crawl process without limit with 10
> > cicles or more but i have problem with space on hard disk, i have
> detected
> > that /etc/tmp have 29 GB used and is not good for me, any body can help
> me
> > or give some advices for configure nutch to make at least one crawl
> process
> > without problems ?
> > >
> > > here some features of my environment
> > > Ram 2 GB
> > > CPU:QuadCore(but im using only 2 cores)
> > > Hard Disk:40 GB
> > > Threads:50
> > > db.fetch.interval.default=2 days
> > >
> > >
> > >
> > > this is a part of my log file when nutch fails:
> > >
> > > ****************************************************************
> > > 2013-02-06 18:45:25,961 INFO  fetcher.Fetcher - fetching
> > http://bibliodoc.uci.cu/TD/TD_03349_10.pdf
> > > 2013-02-06 18:45:25,964 INFO  fetcher.Fetcher - fetching
> > http://bibliodoc.uci.cu/TD/TD_0442_07.pdf
> > > 2013-02-06 18:45:25,977 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=49
> > > 2013-02-06 18:45:26,109 INFO  fetcher.Fetcher - -activeThreads=49,
> > spinWaiting=39, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:26,180 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=48
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=47
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=46
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=44
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=45
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=40
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=39
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=38
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=37
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=36
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=35
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=34
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=33
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=32
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=31
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=30
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=29
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=28
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=27
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=26
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=25
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=24
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=23
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=22
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=21
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=20
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=19
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=18
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=41
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=17
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=15
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=13
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=12
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=42
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=43
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=9
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=10
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=11
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=14
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=16
> > > 2013-02-06 18:45:26,404 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=8
> > > 2013-02-06 18:45:26,630 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=7
> > > 2013-02-06 18:45:27,069 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=6
> > > 2013-02-06 18:45:27,110 INFO  fetcher.Fetcher - -activeThreads=6,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:27,129 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=5
> > > 2013-02-06 18:45:28,110 INFO  fetcher.Fetcher - -activeThreads=5,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:28,502 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=4
> > > 2013-02-06 18:45:29,111 INFO  fetcher.Fetcher - -activeThreads=4,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:30,123 INFO  fetcher.Fetcher - -activeThreads=4,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:31,127 INFO  fetcher.Fetcher - -activeThreads=4,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:31,187 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=3
> > > 2013-02-06 18:45:32,171 INFO  fetcher.Fetcher - -activeThreads=3,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:32,206 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=2
> > > 2013-02-06 18:45:33,173 INFO  fetcher.Fetcher - -activeThreads=2,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:34,173 INFO  fetcher.Fetcher - -activeThreads=2,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:34,205 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=1
> > > 2013-02-06 18:45:34,457 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=0
> > > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0
> > > 2013-02-06 18:45:35,742 WARN  mapred.LocalJobRunner - job_local_0015
> > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> any
> > valid local directory for output/file.out
> > >         at
> >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
> > >         at
> >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
> > >         at
> >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
> > >         at
> >
> >
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
> > >         at
> >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
> > >         at
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
> > >         at
> > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
> > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> > >         at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> > >
> >
> > --
> > *Lewis*
> >
>

Re: Could not find any valid local directory for output/file.out

Posted by Tejas Patil <te...@gmail.com>.
On Thu, Feb 7, 2013 at 12:47 PM, Eyeris Rodriguez Rueda <er...@uci.cu>wrote:

> Thank to all for your replies.
> If i want to change the default location for hadoop job(/tmp), where i can
> do that ?, because my nutch-site.xml not include nothing pointing to /tmp.
>
Add this property to nutch-site.xml with appropriate value:

<property>
<name>hadoop.tmp.dir</name>
<value>XXXXXXXXXX</value>
</property>



> So I have readed about nutch and hadoop but im not sure to understand at
> all. Is posible to use nutch 1.5.1 in distributed mode ?

yes


> In this case what i need to do for that, I really appreciated your answer
> because I can´t find a good documentation for this topic.
>
For distributed mode, Nutch is called from runtime/deploy. The conf files
should be modified in runtime/local/conf, not in $NUTCH_HOME/conf.
So modify the runtime/local/conf/nutch-site.xml to set
http.agent.nameproperly.  I am assuming that the hadoop setup is in
place and hadoop
variables are exported. Now, run the nutch commands from runtime/deploy.

Thanks,
Tejas Patil

>
>
>
> ----- Mensaje original -----
> De: "Tejas Patil" <te...@gmail.com>
> Para: user@nutch.apache.org
> Enviados: Jueves, 7 de Febrero 2013 14:04:26
> Asunto: Re: Could not find any valid local directory for output/file.out
>
> Nutch jobs are executed by Hadoop. "/tmp" is the default location used by
> hadoop to store temporary data required for a job. If you dont over-ride
> hadoop.tmp.dir in any config file, it will use /tmp by default. In your
> case, /tmp doesnt have ample space left so better over-ride that property
> and point it to some other location which has ample space.
>
> Thanks,
> Tejas Patil
>
>
> On Thu, Feb 7, 2013 at 10:38 AM, Eyeris Rodriguez Rueda <erueda@uci.cu
> >wrote:
>
> > Thanks lewis by your answer.
> > My doubt is why /tmp is increasing while crawl process is doing, and why
> > nutch use that folder. Im using nutch 1.5.1 in single mode and my nutch
> > site not have properties hadoop.tmp.dir. I need reduce the space used for
> > that folder because I only have 40 GB for nutch machine and 50 GB for
> solr
> > machine. Please some advice or explanation will be accepted.
> > Thanks for your time.
> >
> >
> >
> > ----- Mensaje original -----
> > De: "Lewis John Mcgibbney" <le...@gmail.com>
> > Para: user@nutch.apache.org
> > Enviados: Jueves, 7 de Febrero 2013 13:06:11
> > Asunto: Re: Could not find any valid local directory for output/file.out
> >
> > Hi,
> >
> >
> >
> https://wiki.apache.org/nutch/NutchGotchas#DiskErrorException_while_fetching
> >
> > On Thursday, February 7, 2013, Eyeris Rodriguez Rueda <er...@uci.cu>
> > wrote:
> > > Hi all.
> > > I have a problem when i do a crawl for few hour or days, im using nutch
> > 1.5.1 and solr 3.6, but the crawl process fails and i dont know how to
> fix
> > this problem, im intersted in make a crawl process without limit with 10
> > cicles or more but i have problem with space on hard disk, i have
> detected
> > that /etc/tmp have 29 GB used and is not good for me, any body can help
> me
> > or give some advices for configure nutch to make at least one crawl
> process
> > without problems ?
> > >
> > > here some features of my environment
> > > Ram 2 GB
> > > CPU:QuadCore(but im using only 2 cores)
> > > Hard Disk:40 GB
> > > Threads:50
> > > db.fetch.interval.default=2 days
> > >
> > >
> > >
> > > this is a part of my log file when nutch fails:
> > >
> > > ****************************************************************
> > > 2013-02-06 18:45:25,961 INFO  fetcher.Fetcher - fetching
> > http://bibliodoc.uci.cu/TD/TD_03349_10.pdf
> > > 2013-02-06 18:45:25,964 INFO  fetcher.Fetcher - fetching
> > http://bibliodoc.uci.cu/TD/TD_0442_07.pdf
> > > 2013-02-06 18:45:25,977 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=49
> > > 2013-02-06 18:45:26,109 INFO  fetcher.Fetcher - -activeThreads=49,
> > spinWaiting=39, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:26,180 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=48
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=47
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=46
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=44
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=45
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=40
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=39
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=38
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=37
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=36
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=35
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=34
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=33
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=32
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=31
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=30
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=29
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=28
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=27
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=26
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=25
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=24
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=23
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=22
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=21
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=20
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=19
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=18
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=41
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=17
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=15
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=13
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=12
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=42
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=43
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=9
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=10
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=11
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=14
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=16
> > > 2013-02-06 18:45:26,404 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=8
> > > 2013-02-06 18:45:26,630 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=7
> > > 2013-02-06 18:45:27,069 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=6
> > > 2013-02-06 18:45:27,110 INFO  fetcher.Fetcher - -activeThreads=6,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:27,129 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=5
> > > 2013-02-06 18:45:28,110 INFO  fetcher.Fetcher - -activeThreads=5,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:28,502 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=4
> > > 2013-02-06 18:45:29,111 INFO  fetcher.Fetcher - -activeThreads=4,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:30,123 INFO  fetcher.Fetcher - -activeThreads=4,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:31,127 INFO  fetcher.Fetcher - -activeThreads=4,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:31,187 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=3
> > > 2013-02-06 18:45:32,171 INFO  fetcher.Fetcher - -activeThreads=3,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:32,206 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=2
> > > 2013-02-06 18:45:33,173 INFO  fetcher.Fetcher - -activeThreads=2,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:34,173 INFO  fetcher.Fetcher - -activeThreads=2,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:34,205 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=1
> > > 2013-02-06 18:45:34,457 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=0
> > > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0
> > > 2013-02-06 18:45:35,742 WARN  mapred.LocalJobRunner - job_local_0015
> > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> any
> > valid local directory for output/file.out
> > >         at
> >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
> > >         at
> >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
> > >         at
> >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
> > >         at
> >
> >
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
> > >         at
> >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
> > >         at
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
> > >         at
> > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
> > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> > >         at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> > >
> >
> > --
> > *Lewis*
> >
>

Re: Could not find any valid local directory for output/file.out

Posted by Eyeris Rodriguez Rueda <er...@uci.cu>.
Thank to all for your replies.
If i want to change the default location for hadoop job(/tmp), where i can do that ?, because my nutch-site.xml not include nothing pointing to /tmp.
So I have readed about nutch and hadoop but im not sure to understand at all. Is posible to use nutch 1.5.1 in distributed mode ? In this case what i need to do for that, I really appreciated your answer because I can´t find a good documentation for this topic.




----- Mensaje original -----
De: "Tejas Patil" <te...@gmail.com>
Para: user@nutch.apache.org
Enviados: Jueves, 7 de Febrero 2013 14:04:26
Asunto: Re: Could not find any valid local directory for output/file.out

Nutch jobs are executed by Hadoop. "/tmp" is the default location used by
hadoop to store temporary data required for a job. If you dont over-ride
hadoop.tmp.dir in any config file, it will use /tmp by default. In your
case, /tmp doesnt have ample space left so better over-ride that property
and point it to some other location which has ample space.

Thanks,
Tejas Patil


On Thu, Feb 7, 2013 at 10:38 AM, Eyeris Rodriguez Rueda <er...@uci.cu>wrote:

> Thanks lewis by your answer.
> My doubt is why /tmp is increasing while crawl process is doing, and why
> nutch use that folder. Im using nutch 1.5.1 in single mode and my nutch
> site not have properties hadoop.tmp.dir. I need reduce the space used for
> that folder because I only have 40 GB for nutch machine and 50 GB for solr
> machine. Please some advice or explanation will be accepted.
> Thanks for your time.
>
>
>
> ----- Mensaje original -----
> De: "Lewis John Mcgibbney" <le...@gmail.com>
> Para: user@nutch.apache.org
> Enviados: Jueves, 7 de Febrero 2013 13:06:11
> Asunto: Re: Could not find any valid local directory for output/file.out
>
> Hi,
>
>
> https://wiki.apache.org/nutch/NutchGotchas#DiskErrorException_while_fetching
>
> On Thursday, February 7, 2013, Eyeris Rodriguez Rueda <er...@uci.cu>
> wrote:
> > Hi all.
> > I have a problem when i do a crawl for few hour or days, im using nutch
> 1.5.1 and solr 3.6, but the crawl process fails and i dont know how to fix
> this problem, im intersted in make a crawl process without limit with 10
> cicles or more but i have problem with space on hard disk, i have detected
> that /etc/tmp have 29 GB used and is not good for me, any body can help me
> or give some advices for configure nutch to make at least one crawl process
> without problems ?
> >
> > here some features of my environment
> > Ram 2 GB
> > CPU:QuadCore(but im using only 2 cores)
> > Hard Disk:40 GB
> > Threads:50
> > db.fetch.interval.default=2 days
> >
> >
> >
> > this is a part of my log file when nutch fails:
> >
> > ****************************************************************
> > 2013-02-06 18:45:25,961 INFO  fetcher.Fetcher - fetching
> http://bibliodoc.uci.cu/TD/TD_03349_10.pdf
> > 2013-02-06 18:45:25,964 INFO  fetcher.Fetcher - fetching
> http://bibliodoc.uci.cu/TD/TD_0442_07.pdf
> > 2013-02-06 18:45:25,977 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=49
> > 2013-02-06 18:45:26,109 INFO  fetcher.Fetcher - -activeThreads=49,
> spinWaiting=39, fetchQueues.totalSize=0
> > 2013-02-06 18:45:26,180 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=48
> > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=47
> > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=46
> > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=44
> > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=45
> > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=40
> > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=39
> > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=38
> > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=37
> > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=36
> > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=35
> > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=34
> > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=33
> > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=32
> > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=31
> > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=30
> > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=29
> > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=28
> > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=27
> > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=26
> > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=25
> > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=24
> > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=23
> > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=22
> > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=21
> > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=20
> > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=19
> > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=18
> > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=41
> > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=17
> > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=15
> > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=13
> > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=12
> > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=42
> > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=43
> > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=9
> > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=10
> > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=11
> > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=14
> > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=16
> > 2013-02-06 18:45:26,404 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=8
> > 2013-02-06 18:45:26,630 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=7
> > 2013-02-06 18:45:27,069 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=6
> > 2013-02-06 18:45:27,110 INFO  fetcher.Fetcher - -activeThreads=6,
> spinWaiting=0, fetchQueues.totalSize=0
> > 2013-02-06 18:45:27,129 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=5
> > 2013-02-06 18:45:28,110 INFO  fetcher.Fetcher - -activeThreads=5,
> spinWaiting=0, fetchQueues.totalSize=0
> > 2013-02-06 18:45:28,502 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=4
> > 2013-02-06 18:45:29,111 INFO  fetcher.Fetcher - -activeThreads=4,
> spinWaiting=0, fetchQueues.totalSize=0
> > 2013-02-06 18:45:30,123 INFO  fetcher.Fetcher - -activeThreads=4,
> spinWaiting=0, fetchQueues.totalSize=0
> > 2013-02-06 18:45:31,127 INFO  fetcher.Fetcher - -activeThreads=4,
> spinWaiting=0, fetchQueues.totalSize=0
> > 2013-02-06 18:45:31,187 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=3
> > 2013-02-06 18:45:32,171 INFO  fetcher.Fetcher - -activeThreads=3,
> spinWaiting=0, fetchQueues.totalSize=0
> > 2013-02-06 18:45:32,206 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=2
> > 2013-02-06 18:45:33,173 INFO  fetcher.Fetcher - -activeThreads=2,
> spinWaiting=0, fetchQueues.totalSize=0
> > 2013-02-06 18:45:34,173 INFO  fetcher.Fetcher - -activeThreads=2,
> spinWaiting=0, fetchQueues.totalSize=0
> > 2013-02-06 18:45:34,205 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=1
> > 2013-02-06 18:45:34,457 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=0
> > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0,
> spinWaiting=0, fetchQueues.totalSize=0
> > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0
> > 2013-02-06 18:45:35,742 WARN  mapred.LocalJobRunner - job_local_0015
> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
> valid local directory for output/file.out
> >         at
>
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
> >         at
>
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
> >         at
>
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
> >         at
>
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
> >         at
>
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
> >         at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
> >         at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> >         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> >
>
> --
> *Lewis*
>

Re: Could not find any valid local directory for output/file.out

Posted by Tejas Patil <te...@gmail.com>.
Nutch jobs are executed by Hadoop. "/tmp" is the default location used by
hadoop to store temporary data required for a job. If you dont over-ride
hadoop.tmp.dir in any config file, it will use /tmp by default. In your
case, /tmp doesnt have ample space left so better over-ride that property
and point it to some other location which has ample space.

Thanks,
Tejas Patil


On Thu, Feb 7, 2013 at 10:38 AM, Eyeris Rodriguez Rueda <er...@uci.cu>wrote:

> Thanks lewis by your answer.
> My doubt is why /tmp is increasing while crawl process is doing, and why
> nutch use that folder. Im using nutch 1.5.1 in single mode and my nutch
> site not have properties hadoop.tmp.dir. I need reduce the space used for
> that folder because I only have 40 GB for nutch machine and 50 GB for solr
> machine. Please some advice or explanation will be accepted.
> Thanks for your time.
>
>
>
> ----- Mensaje original -----
> De: "Lewis John Mcgibbney" <le...@gmail.com>
> Para: user@nutch.apache.org
> Enviados: Jueves, 7 de Febrero 2013 13:06:11
> Asunto: Re: Could not find any valid local directory for output/file.out
>
> Hi,
>
>
> https://wiki.apache.org/nutch/NutchGotchas#DiskErrorException_while_fetching
>
> On Thursday, February 7, 2013, Eyeris Rodriguez Rueda <er...@uci.cu>
> wrote:
> > Hi all.
> > I have a problem when i do a crawl for few hour or days, im using nutch
> 1.5.1 and solr 3.6, but the crawl process fails and i dont know how to fix
> this problem, im intersted in make a crawl process without limit with 10
> cicles or more but i have problem with space on hard disk, i have detected
> that /etc/tmp have 29 GB used and is not good for me, any body can help me
> or give some advices for configure nutch to make at least one crawl process
> without problems ?
> >
> > here some features of my environment
> > Ram 2 GB
> > CPU:QuadCore(but im using only 2 cores)
> > Hard Disk:40 GB
> > Threads:50
> > db.fetch.interval.default=2 days
> >
> >
> >
> > this is a part of my log file when nutch fails:
> >
> > ****************************************************************
> > 2013-02-06 18:45:25,961 INFO  fetcher.Fetcher - fetching
> http://bibliodoc.uci.cu/TD/TD_03349_10.pdf
> > 2013-02-06 18:45:25,964 INFO  fetcher.Fetcher - fetching
> http://bibliodoc.uci.cu/TD/TD_0442_07.pdf
> > 2013-02-06 18:45:25,977 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=49
> > 2013-02-06 18:45:26,109 INFO  fetcher.Fetcher - -activeThreads=49,
> spinWaiting=39, fetchQueues.totalSize=0
> > 2013-02-06 18:45:26,180 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=48
> > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=47
> > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=46
> > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=44
> > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=45
> > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=40
> > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=39
> > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=38
> > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=37
> > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=36
> > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=35
> > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=34
> > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=33
> > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=32
> > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=31
> > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=30
> > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=29
> > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=28
> > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=27
> > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=26
> > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=25
> > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=24
> > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=23
> > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=22
> > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=21
> > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=20
> > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=19
> > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=18
> > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=41
> > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=17
> > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=15
> > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=13
> > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=12
> > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=42
> > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=43
> > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=9
> > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=10
> > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=11
> > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=14
> > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=16
> > 2013-02-06 18:45:26,404 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=8
> > 2013-02-06 18:45:26,630 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=7
> > 2013-02-06 18:45:27,069 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=6
> > 2013-02-06 18:45:27,110 INFO  fetcher.Fetcher - -activeThreads=6,
> spinWaiting=0, fetchQueues.totalSize=0
> > 2013-02-06 18:45:27,129 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=5
> > 2013-02-06 18:45:28,110 INFO  fetcher.Fetcher - -activeThreads=5,
> spinWaiting=0, fetchQueues.totalSize=0
> > 2013-02-06 18:45:28,502 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=4
> > 2013-02-06 18:45:29,111 INFO  fetcher.Fetcher - -activeThreads=4,
> spinWaiting=0, fetchQueues.totalSize=0
> > 2013-02-06 18:45:30,123 INFO  fetcher.Fetcher - -activeThreads=4,
> spinWaiting=0, fetchQueues.totalSize=0
> > 2013-02-06 18:45:31,127 INFO  fetcher.Fetcher - -activeThreads=4,
> spinWaiting=0, fetchQueues.totalSize=0
> > 2013-02-06 18:45:31,187 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=3
> > 2013-02-06 18:45:32,171 INFO  fetcher.Fetcher - -activeThreads=3,
> spinWaiting=0, fetchQueues.totalSize=0
> > 2013-02-06 18:45:32,206 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=2
> > 2013-02-06 18:45:33,173 INFO  fetcher.Fetcher - -activeThreads=2,
> spinWaiting=0, fetchQueues.totalSize=0
> > 2013-02-06 18:45:34,173 INFO  fetcher.Fetcher - -activeThreads=2,
> spinWaiting=0, fetchQueues.totalSize=0
> > 2013-02-06 18:45:34,205 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=1
> > 2013-02-06 18:45:34,457 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=0
> > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0,
> spinWaiting=0, fetchQueues.totalSize=0
> > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0
> > 2013-02-06 18:45:35,742 WARN  mapred.LocalJobRunner - job_local_0015
> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
> valid local directory for output/file.out
> >         at
>
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
> >         at
>
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
> >         at
>
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
> >         at
>
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
> >         at
>
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
> >         at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
> >         at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> >         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> >
>
> --
> *Lewis*
>

Re: Could not find any valid local directory for output/file.out

Posted by Eyeris Rodriguez Rueda <er...@uci.cu>.
Thanks lewis by your answer.
My doubt is why /tmp is increasing while crawl process is doing, and why nutch use that folder. Im using nutch 1.5.1 in single mode and my nutch site not have properties hadoop.tmp.dir. I need reduce the space used for that folder because I only have 40 GB for nutch machine and 50 GB for solr machine. Please some advice or explanation will be accepted.
Thanks for your time.



----- Mensaje original -----
De: "Lewis John Mcgibbney" <le...@gmail.com>
Para: user@nutch.apache.org
Enviados: Jueves, 7 de Febrero 2013 13:06:11
Asunto: Re: Could not find any valid local directory for output/file.out

Hi,

https://wiki.apache.org/nutch/NutchGotchas#DiskErrorException_while_fetching

On Thursday, February 7, 2013, Eyeris Rodriguez Rueda <er...@uci.cu> wrote:
> Hi all.
> I have a problem when i do a crawl for few hour or days, im using nutch
1.5.1 and solr 3.6, but the crawl process fails and i dont know how to fix
this problem, im intersted in make a crawl process without limit with 10
cicles or more but i have problem with space on hard disk, i have detected
that /etc/tmp have 29 GB used and is not good for me, any body can help me
or give some advices for configure nutch to make at least one crawl process
without problems ?
>
> here some features of my environment
> Ram 2 GB
> CPU:QuadCore(but im using only 2 cores)
> Hard Disk:40 GB
> Threads:50
> db.fetch.interval.default=2 days
>
>
>
> this is a part of my log file when nutch fails:
>
> ****************************************************************
> 2013-02-06 18:45:25,961 INFO  fetcher.Fetcher - fetching
http://bibliodoc.uci.cu/TD/TD_03349_10.pdf
> 2013-02-06 18:45:25,964 INFO  fetcher.Fetcher - fetching
http://bibliodoc.uci.cu/TD/TD_0442_07.pdf
> 2013-02-06 18:45:25,977 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=49
> 2013-02-06 18:45:26,109 INFO  fetcher.Fetcher - -activeThreads=49,
spinWaiting=39, fetchQueues.totalSize=0
> 2013-02-06 18:45:26,180 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=48
> 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=47
> 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=46
> 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=44
> 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=45
> 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=40
> 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=39
> 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=38
> 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=37
> 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=36
> 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=35
> 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=34
> 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=33
> 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=32
> 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=31
> 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=30
> 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=29
> 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=28
> 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=27
> 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=26
> 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=25
> 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=24
> 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=23
> 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=22
> 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=21
> 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=20
> 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=19
> 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=18
> 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=41
> 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=17
> 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=15
> 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=13
> 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=12
> 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=42
> 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=43
> 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=9
> 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=10
> 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=11
> 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=14
> 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=16
> 2013-02-06 18:45:26,404 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=8
> 2013-02-06 18:45:26,630 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=7
> 2013-02-06 18:45:27,069 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=6
> 2013-02-06 18:45:27,110 INFO  fetcher.Fetcher - -activeThreads=6,
spinWaiting=0, fetchQueues.totalSize=0
> 2013-02-06 18:45:27,129 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=5
> 2013-02-06 18:45:28,110 INFO  fetcher.Fetcher - -activeThreads=5,
spinWaiting=0, fetchQueues.totalSize=0
> 2013-02-06 18:45:28,502 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=4
> 2013-02-06 18:45:29,111 INFO  fetcher.Fetcher - -activeThreads=4,
spinWaiting=0, fetchQueues.totalSize=0
> 2013-02-06 18:45:30,123 INFO  fetcher.Fetcher - -activeThreads=4,
spinWaiting=0, fetchQueues.totalSize=0
> 2013-02-06 18:45:31,127 INFO  fetcher.Fetcher - -activeThreads=4,
spinWaiting=0, fetchQueues.totalSize=0
> 2013-02-06 18:45:31,187 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=3
> 2013-02-06 18:45:32,171 INFO  fetcher.Fetcher - -activeThreads=3,
spinWaiting=0, fetchQueues.totalSize=0
> 2013-02-06 18:45:32,206 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=2
> 2013-02-06 18:45:33,173 INFO  fetcher.Fetcher - -activeThreads=2,
spinWaiting=0, fetchQueues.totalSize=0
> 2013-02-06 18:45:34,173 INFO  fetcher.Fetcher - -activeThreads=2,
spinWaiting=0, fetchQueues.totalSize=0
> 2013-02-06 18:45:34,205 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=1
> 2013-02-06 18:45:34,457 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=0
> 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0,
spinWaiting=0, fetchQueues.totalSize=0
> 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0
> 2013-02-06 18:45:35,742 WARN  mapred.LocalJobRunner - job_local_0015
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
valid local directory for output/file.out
>         at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
>         at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
>         at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
>         at
org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
>         at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
>         at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>         at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>

-- 
*Lewis*

Re: Could not find any valid local directory for output/file.out

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi,

https://wiki.apache.org/nutch/NutchGotchas#DiskErrorException_while_fetching

On Thursday, February 7, 2013, Eyeris Rodriguez Rueda <er...@uci.cu> wrote:
> Hi all.
> I have a problem when i do a crawl for few hour or days, im using nutch
1.5.1 and solr 3.6, but the crawl process fails and i dont know how to fix
this problem, im intersted in make a crawl process without limit with 10
cicles or more but i have problem with space on hard disk, i have detected
that /etc/tmp have 29 GB used and is not good for me, any body can help me
or give some advices for configure nutch to make at least one crawl process
without problems ?
>
> here some features of my environment
> Ram 2 GB
> CPU:QuadCore(but im using only 2 cores)
> Hard Disk:40 GB
> Threads:50
> db.fetch.interval.default=2 days
>
>
>
> this is a part of my log file when nutch fails:
>
> ****************************************************************
> 2013-02-06 18:45:25,961 INFO  fetcher.Fetcher - fetching
http://bibliodoc.uci.cu/TD/TD_03349_10.pdf
> 2013-02-06 18:45:25,964 INFO  fetcher.Fetcher - fetching
http://bibliodoc.uci.cu/TD/TD_0442_07.pdf
> 2013-02-06 18:45:25,977 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=49
> 2013-02-06 18:45:26,109 INFO  fetcher.Fetcher - -activeThreads=49,
spinWaiting=39, fetchQueues.totalSize=0
> 2013-02-06 18:45:26,180 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=48
> 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=47
> 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=46
> 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=44
> 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=45
> 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=40
> 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=39
> 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=38
> 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=37
> 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=36
> 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=35
> 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=34
> 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=33
> 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=32
> 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=31
> 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=30
> 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=29
> 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=28
> 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=27
> 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=26
> 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=25
> 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=24
> 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=23
> 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=22
> 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=21
> 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=20
> 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=19
> 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=18
> 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=41
> 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=17
> 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=15
> 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=13
> 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=12
> 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=42
> 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=43
> 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=9
> 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=10
> 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=11
> 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=14
> 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=16
> 2013-02-06 18:45:26,404 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=8
> 2013-02-06 18:45:26,630 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=7
> 2013-02-06 18:45:27,069 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=6
> 2013-02-06 18:45:27,110 INFO  fetcher.Fetcher - -activeThreads=6,
spinWaiting=0, fetchQueues.totalSize=0
> 2013-02-06 18:45:27,129 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=5
> 2013-02-06 18:45:28,110 INFO  fetcher.Fetcher - -activeThreads=5,
spinWaiting=0, fetchQueues.totalSize=0
> 2013-02-06 18:45:28,502 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=4
> 2013-02-06 18:45:29,111 INFO  fetcher.Fetcher - -activeThreads=4,
spinWaiting=0, fetchQueues.totalSize=0
> 2013-02-06 18:45:30,123 INFO  fetcher.Fetcher - -activeThreads=4,
spinWaiting=0, fetchQueues.totalSize=0
> 2013-02-06 18:45:31,127 INFO  fetcher.Fetcher - -activeThreads=4,
spinWaiting=0, fetchQueues.totalSize=0
> 2013-02-06 18:45:31,187 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=3
> 2013-02-06 18:45:32,171 INFO  fetcher.Fetcher - -activeThreads=3,
spinWaiting=0, fetchQueues.totalSize=0
> 2013-02-06 18:45:32,206 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=2
> 2013-02-06 18:45:33,173 INFO  fetcher.Fetcher - -activeThreads=2,
spinWaiting=0, fetchQueues.totalSize=0
> 2013-02-06 18:45:34,173 INFO  fetcher.Fetcher - -activeThreads=2,
spinWaiting=0, fetchQueues.totalSize=0
> 2013-02-06 18:45:34,205 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=1
> 2013-02-06 18:45:34,457 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=0
> 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0,
spinWaiting=0, fetchQueues.totalSize=0
> 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0
> 2013-02-06 18:45:35,742 WARN  mapred.LocalJobRunner - job_local_0015
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
valid local directory for output/file.out
>         at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
>         at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
>         at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
>         at
org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
>         at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
>         at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>         at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>

-- 
*Lewis*