You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Insurance Squared Inc." <gc...@insurancesquared.com> on 2005/12/30 09:57:54 UTC
Nutch freezing on fetch
Hi All,
We're experiencing problems with nutch freezing sporadically when
fetching. Not really to sure where to even start investigating. Some
digging into the archives suggested memory issues, so we did the following:
TOMCAT_OPTS=" -Xmx1024M" to increase Tomcat memory and
NUTCH_HEAPSIZE=1500 to increase Nutch memory.
No effect - it's still freezing. Running on a current version of
mandrake linux on it's own box, 2 gigs of ram, 2.4ghz P4. I believe
we're running version .7 of nutch Does anyone have any suggestions as to
where I should start investigating?
Thanks.
Re: Nutch freezing on fetch
Posted by "Insurance Squared Inc." <gc...@insurancesquared.com>.
Andrzej Bialecki wrote:
> Insurance Squared Inc. wrote:
>
>> Hi All,
>>
>> We're experiencing problems with nutch freezing sporadically when
>> fetching. Not really to sure where to even start investigating.
>> Some digging into the archives suggested memory issues, so we did the
>> following:
>>
>> TOMCAT_OPTS=" -Xmx1024M" to increase Tomcat memory and
>> NUTCH_HEAPSIZE=1500 to increase Nutch memory.
>>
>> No effect - it's still freezing. Running on a current version of
>> mandrake linux on it's own box, 2 gigs of ram, 2.4ghz P4. I believe
>> we're running version .7 of nutch Does anyone have any suggestions as
>> to where I should start investigating?
>
>
>
> Do you use the parse-pdf plugin? Please do a thread dump of the stuck
> process (Ctrl-E, if I'm not mistaken).
>
Re: Nutch freezing on fetch
Posted by Andrzej Bialecki <ab...@getopt.org>.
Howie Wang wrote:
> I was wondering how to recover from a bad fetch. Should I consider
> the segment corrupt and just delete it? Then should I reset the
> fetch date in the webdb so that it will refetch it?
Unfinished segments are ok, you can use them for further processing. Of
course, the parts that are not fetched won't be processed at all, so
those Pages in WebDB won't get updated and you will have to wait another
week (or use -adddays).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: Nutch freezing on fetch
Posted by Howie Wang <ho...@hotmail.com>.
I've been getting periodic freezes during fetching also. I tracked
down one of the causes to a Java regular expression in my parse
filters. Java's regex support has been a source of lots of frustration
for me. But I'm not positive this was the only cause since I've
frozen on URLs that I couldn't reproduce on my test box. I'm on
JDK 1.4.2, by the way.
I was wondering how to recover from a bad fetch. Should I consider
the segment corrupt and just delete it? Then should I reset the
fetch date in the webdb so that it will refetch it?
Howie
>Insurance Squared Inc. wrote:
>
>>Hi,
>>
>>Here's the output from when it freezes. Sorry it's a bit verbose, wasn't
>>sure what we're looking for so I've included it all:
>>051230 131519 fetching
>>http://www.municipalaffairs.gov.ab.ca/fco/pdf/ab-clan6-1.pdf
>>Full thread dump Java HotSpot(TM) Client VM (1.4.2_10-b03 mixed mode):
>>
>>"fetcher9" prio=1 tid=0x082f5b40 nid=0x1442 waiting for monitor entry
>>[5097a000..5097a228]
>> at
>>org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
>>
>> - waiting to lock <0x590efa58> (a
>>org.apache.nutch.io.ArrayFile$Writer)
>
>
>[...]
>
>>"fetcher1" prio=1 tid=0x081640c0 nid=0x1442 runnable [50d86000..50d87228]
>> at java.util.zip.Deflater.deflateBytes(Native Method)
>> at java.util.zip.Deflater.deflate(Deflater.java:287)
>> - locked <0x61da0728> (a java.util.zip.Deflater)
>> at
>>java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:154)
>> at
>>java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:114)
>> at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:72)
>> - locked <0x61da0960> (a java.util.zip.GZIPOutputStream)
>> at
>>org.apache.nutch.io.WritableUtils.writeCompressedByteArray(WritableUtils.java:53)
>>
>> at org.apache.nutch.protocol.Content.write(Content.java:81)
>> at
>>org.apache.nutch.io.SequenceFile$Writer.append(SequenceFile.java:137)
>> at org.apache.nutch.io.MapFile$Writer.append(MapFile.java:127)
>> - locked <0x590efa90> (a org.apache.nutch.io.ArrayFile$Writer)
>> at org.apache.nutch.io.ArrayFile$Writer.append(ArrayFile.java:39)
>> - locked <0x590efa90> (a org.apache.nutch.io.ArrayFile$Writer)
>> at
>>org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:278)
>>
>> - locked <0x590efa58> (a org.apache.nutch.io.ArrayFile$Writer)
>
> ^^^^^^^^^^^^^^^^
>
>> at
>>org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261)
>>
>> at
>>org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)
>
>
>This is very strange... looks like all threads are waiting for this thread
>to finish compressing the data, but there is nothing in WritableUtils, or
>even in the source of Deflater.java to suggest what is happening... Could
>you connect a debugger, and run the process under debugger? It would also
>help to get a couple of thread dumps and compare them, if they look the
>same...
>
>BTW. if you have a JDK 1.5 on that machine you could try running this with
>1.5 and see if it helps.
>
>--
>Best regards,
>Andrzej Bialecki <><
>___. ___ ___ ___ _ _ __________________________________
>[__ || __|__/|__||\/| Information Retrieval, Semantic Web
>___|||__|| \| || | Embedded Unix, System Integration
>http://www.sigram.com Contact: info at sigram dot com
>
>
Re: Nutch freezing on fetch
Posted by Andrzej Bialecki <ab...@getopt.org>.
Insurance Squared Inc. wrote:
> Hi,
>
> Here's the output from when it freezes. Sorry it's a bit verbose,
> wasn't sure what we're looking for so I've included it all:
> 051230 131519 fetching
> http://www.municipalaffairs.gov.ab.ca/fco/pdf/ab-clan6-1.pdf
> Full thread dump Java HotSpot(TM) Client VM (1.4.2_10-b03 mixed mode):
>
> "fetcher9" prio=1 tid=0x082f5b40 nid=0x1442 waiting for monitor entry
> [5097a000..5097a228]
> at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
>
> - waiting to lock <0x590efa58> (a
> org.apache.nutch.io.ArrayFile$Writer)
[...]
> "fetcher1" prio=1 tid=0x081640c0 nid=0x1442 runnable [50d86000..50d87228]
> at java.util.zip.Deflater.deflateBytes(Native Method)
> at java.util.zip.Deflater.deflate(Deflater.java:287)
> - locked <0x61da0728> (a java.util.zip.Deflater)
> at
> java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:154)
> at
> java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:114)
> at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:72)
> - locked <0x61da0960> (a java.util.zip.GZIPOutputStream)
> at
> org.apache.nutch.io.WritableUtils.writeCompressedByteArray(WritableUtils.java:53)
>
> at org.apache.nutch.protocol.Content.write(Content.java:81)
> at
> org.apache.nutch.io.SequenceFile$Writer.append(SequenceFile.java:137)
> at org.apache.nutch.io.MapFile$Writer.append(MapFile.java:127)
> - locked <0x590efa90> (a org.apache.nutch.io.ArrayFile$Writer)
> at org.apache.nutch.io.ArrayFile$Writer.append(ArrayFile.java:39)
> - locked <0x590efa90> (a org.apache.nutch.io.ArrayFile$Writer)
> at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:278)
>
> - locked <0x590efa58> (a org.apache.nutch.io.ArrayFile$Writer)
^^^^^^^^^^^^^^^^
> at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261)
>
> at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)
This is very strange... looks like all threads are waiting for this
thread to finish compressing the data, but there is nothing in
WritableUtils, or even in the source of Deflater.java to suggest what is
happening... Could you connect a debugger, and run the process under
debugger? It would also help to get a couple of thread dumps and compare
them, if they look the same...
BTW. if you have a JDK 1.5 on that machine you could try running this
with 1.5 and see if it helps.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: Nutch freezing on fetch
Posted by "Insurance Squared Inc." <gc...@insurancesquared.com>.
Hi,
Here's the output from when it freezes. Sorry it's a bit verbose,
wasn't sure what we're looking for so I've included it all:
051230 131519 fetching
http://www.municipalaffairs.gov.ab.ca/fco/pdf/ab-clan6-1.pdf
Full thread dump Java HotSpot(TM) Client VM (1.4.2_10-b03 mixed mode):
"fetcher9" prio=1 tid=0x082f5b40 nid=0x1442 waiting for monitor entry
[5097a000..5097a228]
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
- waiting to lock <0x590efa58> (a
org.apache.nutch.io.ArrayFile$Writer)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)
"fetcher8" prio=1 tid=0x08165ed8 nid=0x1442 waiting for monitor entry
[509fb000..509fb228]
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
- waiting to lock <0x590efa58> (a
org.apache.nutch.io.ArrayFile$Writer)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)
"fetcher7" prio=1 tid=0x0816bb28 nid=0x1442 waiting for monitor entry
[50a7c000..50a7c228]
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
- waiting to lock <0x590efa58> (a
org.apache.nutch.io.ArrayFile$Writer)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)
"fetcher6" prio=1 tid=0x0816b9c0 nid=0x1442 waiting for monitor entry
[50afd000..50afd228]
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
- waiting to lock <0x590efa58> (a
org.apache.nutch.io.ArrayFile$Writer)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)
"fetcher5" prio=1 tid=0x081672f0 nid=0x1442 waiting for monitor entry
[50b7e000..50b7e228]
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
- waiting to lock <0x590efa58> (a
org.apache.nutch.io.ArrayFile$Writer)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)
"fetcher4" prio=1 tid=0x082f73a0 nid=0x1442 waiting for monitor entry
[50bff000..50bff228]
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
- waiting to lock <0x590efa58> (a
org.apache.nutch.io.ArrayFile$Writer)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)
"fetcher3" prio=1 tid=0x08168768 nid=0x1442 waiting for monitor entry
[50c80000..50c80228]
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
- waiting to lock <0x590efa58> (a
org.apache.nutch.io.ArrayFile$Writer)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)
"fetcher2" prio=1 tid=0x08168320 nid=0x1442 waiting for monitor entry
[50d01000..50d01228]
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
- waiting to lock <0x590efa58> (a
org.apache.nutch.io.ArrayFile$Writer)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)
"fetcher1" prio=1 tid=0x081640c0 nid=0x1442 runnable [50d86000..50d87228]
at java.util.zip.Deflater.deflateBytes(Native Method)
at java.util.zip.Deflater.deflate(Deflater.java:287)
- locked <0x61da0728> (a java.util.zip.Deflater)
at
java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:154)
at
java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:114)
at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:72)
- locked <0x61da0960> (a java.util.zip.GZIPOutputStream)
at
org.apache.nutch.io.WritableUtils.writeCompressedByteArray(WritableUtils.java:53)
at org.apache.nutch.protocol.Content.write(Content.java:81)
at
org.apache.nutch.io.SequenceFile$Writer.append(SequenceFile.java:137)
at org.apache.nutch.io.MapFile$Writer.append(MapFile.java:127)
- locked <0x590efa90> (a org.apache.nutch.io.ArrayFile$Writer)
at org.apache.nutch.io.ArrayFile$Writer.append(ArrayFile.java:39)
- locked <0x590efa90> (a org.apache.nutch.io.ArrayFile$Writer)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:278)
- locked <0x590efa58> (a org.apache.nutch.io.ArrayFile$Writer)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)
"fetcher0" prio=1 tid=0x08163d68 nid=0x1442 waiting for monitor entry
[50e08000..50e08228]
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277)
- waiting to lock <0x590efa58> (a
org.apache.nutch.io.ArrayFile$Writer)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)
"Signal Dispatcher" daemon prio=1 tid=0x080974d0 nid=0x1442 runnable [0..0]
"Finalizer" daemon prio=1 tid=0x08092448 nid=0x1442 in Object.wait()
[51588000..51588228]
at java.lang.Object.wait(Native Method)
- waiting on <0x58fe8c40> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
- locked <0x58fe8c40> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
"Reference Handler" daemon prio=1 tid=0x080918a0 nid=0x1442 in
Object.wait() [51609000..51609228]
at java.lang.Object.wait(Native Method)
- waiting on <0x58fe8ca8> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:429)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:115)
- locked <0x58fe8ca8> (a java.lang.ref.Reference$Lock)
"main" prio=1 tid=0x0805be28 nid=0x1442 waiting on condition
[bfffd000..bfffd2dc]
at java.lang.Thread.sleep(Native Method)
at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:351)
at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:488)
"VM Thread" prio=1 tid=0x08090638 nid=0x1442 runnable
"VM Periodic Task Thread" prio=1 tid=0x08099cd8 nid=0x1442 waiting on
condition
"Suspend Checker Thread" prio=1 tid=0x08096b28 nid=0x1442 runnable
Thanks again.
Andrzej Bialecki wrote:
> Insurance Squared Inc. wrote:
>
>> Hi All,
>>
>> We're experiencing problems with nutch freezing sporadically when
>> fetching. Not really to sure where to even start investigating.
>> Some digging into the archives suggested memory issues, so we did the
>> following:
>>
>> TOMCAT_OPTS=" -Xmx1024M" to increase Tomcat memory and
>> NUTCH_HEAPSIZE=1500 to increase Nutch memory.
>>
>> No effect - it's still freezing. Running on a current version of
>> mandrake linux on it's own box, 2 gigs of ram, 2.4ghz P4. I believe
>> we're running version .7 of nutch Does anyone have any suggestions as
>> to where I should start investigating?
>
>
>
> Do you use the parse-pdf plugin? Please do a thread dump of the stuck
> process (Ctrl-E, if I'm not mistaken).
>
Re: Nutch freezing on fetch
Posted by Andrzej Bialecki <ab...@getopt.org>.
Insurance Squared Inc. wrote:
> Hi All,
>
> We're experiencing problems with nutch freezing sporadically when
> fetching. Not really to sure where to even start investigating. Some
> digging into the archives suggested memory issues, so we did the
> following:
>
> TOMCAT_OPTS=" -Xmx1024M" to increase Tomcat memory and
> NUTCH_HEAPSIZE=1500 to increase Nutch memory.
>
> No effect - it's still freezing. Running on a current version of
> mandrake linux on it's own box, 2 gigs of ram, 2.4ghz P4. I believe
> we're running version .7 of nutch Does anyone have any suggestions as
> to where I should start investigating?
Do you use the parse-pdf plugin? Please do a thread dump of the stuck
process (Ctrl-E, if I'm not mistaken).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com