You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Insurance Squared Inc." <gc...@insurancesquared.com> on 2005/12/30 09:57:54 UTC

Nutch freezing on fetch

Hi All,

We're experiencing problems with nutch freezing sporadically when 
fetching.  Not really to sure where to even start investigating.  Some 
digging into the archives suggested memory issues, so we did the following:

TOMCAT_OPTS=" -Xmx1024M" to increase Tomcat memory and 
NUTCH_HEAPSIZE=1500 to increase Nutch memory.

No effect - it's still freezing. Running on a current version of 
mandrake linux on it's own box, 2 gigs of ram, 2.4ghz P4. I believe 
we're running version .7 of nutch Does anyone have any suggestions as to 
where I should start investigating?

Thanks.

Re: Nutch freezing on fetch

Posted by "Insurance Squared Inc." <gc...@insurancesquared.com>.

Andrzej Bialecki wrote:

> Insurance Squared Inc. wrote:
>
>> Hi All,
>>
>> We're experiencing problems with nutch freezing sporadically when 
>> fetching.  Not really to sure where to even start investigating.  
>> Some digging into the archives suggested memory issues, so we did the 
>> following:
>>
>> TOMCAT_OPTS=" -Xmx1024M" to increase Tomcat memory and 
>> NUTCH_HEAPSIZE=1500 to increase Nutch memory.
>>
>> No effect - it's still freezing. Running on a current version of 
>> mandrake linux on it's own box, 2 gigs of ram, 2.4ghz P4. I believe 
>> we're running version .7 of nutch Does anyone have any suggestions as 
>> to where I should start investigating?
>
>
>
> Do you use the parse-pdf plugin? Please do a thread dump of the stuck 
> process (Ctrl-E, if I'm not mistaken).
>

Re: Nutch freezing on fetch

Posted by Andrzej Bialecki <ab...@getopt.org>.
Howie Wang wrote:

> I was wondering how to recover from a bad fetch. Should I consider
> the segment corrupt and just delete it? Then should I reset the
> fetch date in the webdb so that it will refetch it?

Unfinished segments are ok, you can use them for further processing. Of 
course, the parts that are not fetched won't be processed at all, so 
those Pages in WebDB won't get updated and you will have to wait another 
week (or use -adddays).

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Nutch freezing on fetch

Posted by Howie Wang <ho...@hotmail.com>.
I've been getting periodic freezes during fetching also. I tracked
down one of the causes to a Java regular expression in my parse
filters. Java's regex support has been a source of lots of frustration
for me. But I'm not positive this was the only cause since I've
frozen on URLs that I couldn't reproduce on my test box. I'm on
JDK 1.4.2, by the way.

I was wondering how to recover from a bad fetch. Should I consider
the segment corrupt and just delete it? Then should I reset the
fetch date in the webdb so that it will refetch it?

Howie

>Insurance Squared Inc. wrote:
>
>>Hi,
>>
>>Here's the output from when it freezes.  Sorry it's a bit verbose, wasn't 
>>sure what we're looking for so I've included it all:
>>051230 131519 fetching 
>>http://www.municipalaffairs.gov.ab.ca/fco/pdf/ab-clan6-1.pdf
>>Full thread dump Java HotSpot(TM) Client VM (1.4.2_10-b03 mixed mode):
>>
>>"fetcher9" prio=1 tid=0x082f5b40 nid=0x1442 waiting for monitor entry 
>>[5097a000..5097a228]
>>       at 
>>org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
>>
>>       - waiting to lock <0x590efa58> (a 
>>org.apache.nutch.io.ArrayFile$Writer)
>
>
>[...]
>
>>"fetcher1" prio=1 tid=0x081640c0 nid=0x1442 runnable [50d86000..50d87228]
>>       at java.util.zip.Deflater.deflateBytes(Native Method)
>>       at java.util.zip.Deflater.deflate(Deflater.java:287)
>>       - locked <0x61da0728> (a java.util.zip.Deflater)
>>       at 
>>java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:154)
>>       at 
>>java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:114)
>>       at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:72)
>>       - locked <0x61da0960> (a java.util.zip.GZIPOutputStream)
>>       at 
>>org.apache.nutch.io.WritableUtils.writeCompressedByteArray(WritableUtils.java:53)
>>
>>       at org.apache.nutch.protocol.Content.write(Content.java:81)
>>       at 
>>org.apache.nutch.io.SequenceFile$Writer.append(SequenceFile.java:137)
>>       at org.apache.nutch.io.MapFile$Writer.append(MapFile.java:127)
>>       - locked <0x590efa90> (a org.apache.nutch.io.ArrayFile$Writer)
>>       at org.apache.nutch.io.ArrayFile$Writer.append(ArrayFile.java:39)
>>       - locked <0x590efa90> (a org.apache.nutch.io.ArrayFile$Writer)
>>       at 
>>org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:278)
>>
>>       - locked <0x590efa58> (a org.apache.nutch.io.ArrayFile$Writer)
>
>             ^^^^^^^^^^^^^^^^
>
>>       at 
>>org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261)
>>
>>       at 
>>org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)
>
>
>This is very strange... looks like all threads are waiting for this thread 
>to finish compressing the data, but there is nothing in WritableUtils, or 
>even in the source of Deflater.java to suggest what is happening... Could 
>you connect a debugger, and run the process under debugger? It would also 
>help to get a couple of thread dumps and compare them, if they look the 
>same...
>
>BTW. if you have a JDK 1.5 on that machine you could try running this with 
>1.5 and see if it helps.
>
>--
>Best regards,
>Andrzej Bialecki     <><
>___. ___ ___ ___ _ _   __________________________________
>[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>___|||__||  \|  ||  |  Embedded Unix, System Integration
>http://www.sigram.com  Contact: info at sigram dot com
>
>



Re: Nutch freezing on fetch

Posted by Andrzej Bialecki <ab...@getopt.org>.
Insurance Squared Inc. wrote:

> Hi,
>
> Here's the output from when it freezes.  Sorry it's a bit verbose, 
> wasn't sure what we're looking for so I've included it all:
> 051230 131519 fetching 
> http://www.municipalaffairs.gov.ab.ca/fco/pdf/ab-clan6-1.pdf
> Full thread dump Java HotSpot(TM) Client VM (1.4.2_10-b03 mixed mode):
>
> "fetcher9" prio=1 tid=0x082f5b40 nid=0x1442 waiting for monitor entry 
> [5097a000..5097a228]
>       at 
> org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277) 
>
>       - waiting to lock <0x590efa58> (a 
> org.apache.nutch.io.ArrayFile$Writer)


[...]

> "fetcher1" prio=1 tid=0x081640c0 nid=0x1442 runnable [50d86000..50d87228]
>       at java.util.zip.Deflater.deflateBytes(Native Method)
>       at java.util.zip.Deflater.deflate(Deflater.java:287)
>       - locked <0x61da0728> (a java.util.zip.Deflater)
>       at 
> java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:154)
>       at 
> java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:114)
>       at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:72)
>       - locked <0x61da0960> (a java.util.zip.GZIPOutputStream)
>       at 
> org.apache.nutch.io.WritableUtils.writeCompressedByteArray(WritableUtils.java:53) 
>
>       at org.apache.nutch.protocol.Content.write(Content.java:81)
>       at 
> org.apache.nutch.io.SequenceFile$Writer.append(SequenceFile.java:137)
>       at org.apache.nutch.io.MapFile$Writer.append(MapFile.java:127)
>       - locked <0x590efa90> (a org.apache.nutch.io.ArrayFile$Writer)
>       at org.apache.nutch.io.ArrayFile$Writer.append(ArrayFile.java:39)
>       - locked <0x590efa90> (a org.apache.nutch.io.ArrayFile$Writer)
>       at 
> org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:278) 
>
>       - locked <0x590efa58> (a org.apache.nutch.io.ArrayFile$Writer)

             ^^^^^^^^^^^^^^^^

>       at 
> org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261) 
>
>       at 
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)


This is very strange... looks like all threads are waiting for this 
thread to finish compressing the data, but there is nothing in 
WritableUtils, or even in the source of Deflater.java to suggest what is 
happening... Could you connect a debugger, and run the process under 
debugger? It would also help to get a couple of thread dumps and compare 
them, if they look the same...

BTW. if you have a JDK 1.5 on that machine you could try running this 
with 1.5 and see if it helps.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Nutch freezing on fetch

Posted by "Insurance Squared Inc." <gc...@insurancesquared.com>.
Hi,

Here's the output from when it freezes.  Sorry it's a bit verbose, 
wasn't sure what we're looking for so I've included it all:
051230 131519 fetching 
http://www.municipalaffairs.gov.ab.ca/fco/pdf/ab-clan6-1.pdf
Full thread dump Java HotSpot(TM) Client VM (1.4.2_10-b03 mixed mode):

"fetcher9" prio=1 tid=0x082f5b40 nid=0x1442 waiting for monitor entry 
[5097a000..5097a228]
       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
       - waiting to lock <0x590efa58> (a 
org.apache.nutch.io.ArrayFile$Writer)
       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261) 

       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher8" prio=1 tid=0x08165ed8 nid=0x1442 waiting for monitor entry 
[509fb000..509fb228]
       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
       - waiting to lock <0x590efa58> (a 
org.apache.nutch.io.ArrayFile$Writer)
       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261) 

       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher7" prio=1 tid=0x0816bb28 nid=0x1442 waiting for monitor entry 
[50a7c000..50a7c228]
       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
       - waiting to lock <0x590efa58> (a 
org.apache.nutch.io.ArrayFile$Writer)
       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261) 

       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher6" prio=1 tid=0x0816b9c0 nid=0x1442 waiting for monitor entry 
[50afd000..50afd228]
       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
       - waiting to lock <0x590efa58> (a 
org.apache.nutch.io.ArrayFile$Writer)
       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261) 

       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher5" prio=1 tid=0x081672f0 nid=0x1442 waiting for monitor entry 
[50b7e000..50b7e228]
       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
       - waiting to lock <0x590efa58> (a 
org.apache.nutch.io.ArrayFile$Writer)
       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261) 

       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher4" prio=1 tid=0x082f73a0 nid=0x1442 waiting for monitor entry 
[50bff000..50bff228]
       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
       - waiting to lock <0x590efa58> (a 
org.apache.nutch.io.ArrayFile$Writer)
       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261) 

       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher3" prio=1 tid=0x08168768 nid=0x1442 waiting for monitor entry 
[50c80000..50c80228]
       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
       - waiting to lock <0x590efa58> (a 
org.apache.nutch.io.ArrayFile$Writer)
       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261) 

       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher2" prio=1 tid=0x08168320 nid=0x1442 waiting for monitor entry 
[50d01000..50d01228]
       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
       - waiting to lock <0x590efa58> (a 
org.apache.nutch.io.ArrayFile$Writer)
       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261) 

       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher1" prio=1 tid=0x081640c0 nid=0x1442 runnable [50d86000..50d87228]
       at java.util.zip.Deflater.deflateBytes(Native Method)
       at java.util.zip.Deflater.deflate(Deflater.java:287)
       - locked <0x61da0728> (a java.util.zip.Deflater)
       at 
java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:154)
       at 
java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:114)
       at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:72)
       - locked <0x61da0960> (a java.util.zip.GZIPOutputStream)
       at 
org.apache.nutch.io.WritableUtils.writeCompressedByteArray(WritableUtils.java:53) 

       at org.apache.nutch.protocol.Content.write(Content.java:81)
       at 
org.apache.nutch.io.SequenceFile$Writer.append(SequenceFile.java:137)
       at org.apache.nutch.io.MapFile$Writer.append(MapFile.java:127)
       - locked <0x590efa90> (a org.apache.nutch.io.ArrayFile$Writer)
       at org.apache.nutch.io.ArrayFile$Writer.append(ArrayFile.java:39)
       - locked <0x590efa90> (a org.apache.nutch.io.ArrayFile$Writer)
       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:278)
       - locked <0x590efa58> (a org.apache.nutch.io.ArrayFile$Writer)
       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261) 

       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher0" prio=1 tid=0x08163d68 nid=0x1442 waiting for monitor entry 
[50e08000..50e08228]
       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
       - waiting to lock <0x590efa58> (a 
org.apache.nutch.io.ArrayFile$Writer)
       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261) 

       at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"Signal Dispatcher" daemon prio=1 tid=0x080974d0 nid=0x1442 runnable [0..0]

"Finalizer" daemon prio=1 tid=0x08092448 nid=0x1442 in Object.wait() 
[51588000..51588228]
       at java.lang.Object.wait(Native Method)
       - waiting on <0x58fe8c40> (a java.lang.ref.ReferenceQueue$Lock)
       at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
       - locked <0x58fe8c40> (a java.lang.ref.ReferenceQueue$Lock)
       at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127)
       at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=1 tid=0x080918a0 nid=0x1442 in 
Object.wait() [51609000..51609228]
       at java.lang.Object.wait(Native Method)
       - waiting on <0x58fe8ca8> (a java.lang.ref.Reference$Lock)
       at java.lang.Object.wait(Object.java:429)
       at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:115)
       - locked <0x58fe8ca8> (a java.lang.ref.Reference$Lock)

"main" prio=1 tid=0x0805be28 nid=0x1442 waiting on condition 
[bfffd000..bfffd2dc]
       at java.lang.Thread.sleep(Native Method)
       at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:351)
       at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:488)

"VM Thread" prio=1 tid=0x08090638 nid=0x1442 runnable

"VM Periodic Task Thread" prio=1 tid=0x08099cd8 nid=0x1442 waiting on 
condition
"Suspend Checker Thread" prio=1 tid=0x08096b28 nid=0x1442 runnable


Thanks again.



Andrzej Bialecki wrote:

> Insurance Squared Inc. wrote:
>
>> Hi All,
>>
>> We're experiencing problems with nutch freezing sporadically when 
>> fetching.  Not really to sure where to even start investigating.  
>> Some digging into the archives suggested memory issues, so we did the 
>> following:
>>
>> TOMCAT_OPTS=" -Xmx1024M" to increase Tomcat memory and 
>> NUTCH_HEAPSIZE=1500 to increase Nutch memory.
>>
>> No effect - it's still freezing. Running on a current version of 
>> mandrake linux on it's own box, 2 gigs of ram, 2.4ghz P4. I believe 
>> we're running version .7 of nutch Does anyone have any suggestions as 
>> to where I should start investigating?
>
>
>
> Do you use the parse-pdf plugin? Please do a thread dump of the stuck 
> process (Ctrl-E, if I'm not mistaken).
>

Re: Nutch freezing on fetch

Posted by Andrzej Bialecki <ab...@getopt.org>.
Insurance Squared Inc. wrote:

> Hi All,
>
> We're experiencing problems with nutch freezing sporadically when 
> fetching.  Not really to sure where to even start investigating.  Some 
> digging into the archives suggested memory issues, so we did the 
> following:
>
> TOMCAT_OPTS=" -Xmx1024M" to increase Tomcat memory and 
> NUTCH_HEAPSIZE=1500 to increase Nutch memory.
>
> No effect - it's still freezing. Running on a current version of 
> mandrake linux on it's own box, 2 gigs of ram, 2.4ghz P4. I believe 
> we're running version .7 of nutch Does anyone have any suggestions as 
> to where I should start investigating?


Do you use the parse-pdf plugin? Please do a thread dump of the stuck 
process (Ctrl-E, if I'm not mistaken).

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com