You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Marko Bauhardt <mb...@media-style.com> on 2005/10/18 09:50:49 UTC
RegexUrlFilter hangs up
Hi all,
I use nutch-mapred from the svn-branch. Sometimes the reduce job of
the fetchprocess hangs up. The CoreDump prints out that the
RegexUrlFilter is in work.
In the regex-urlfilter.txt i uncommented the line
#-[?*!@=]
because I want to fetch dynamic urls like jsp's.
Here is the CoreDump.
051017 151123 reduce > reduce
Full thread dump Java HotSpot(TM) Client VM (1.4.2_08-b03 mixed mode):
"MultiThreadedHttpConnectionManager cleanup" daemon prio=1
tid=0x08249fa0 nid=0x7645 in Object.wait() [6d489000..6d489868]
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
- locked <0x753a19c0> (a java.lang.ref.ReferenceQueue$Lock)
at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager
$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1100)
"Thread-1" prio=1 tid=0x082149b0 nid=0x7645 runnable
[6efc3000..6efc3868]
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown
Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown
Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown
Source)
at org.apache.oro.text.regex.Perl5Matcher.__tryExpression
(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__interpret
(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.contains(Unknown
Source)
at org.apache.oro.text.regex.Perl5Matcher.contains(Unknown
Source)
at org.apache.nutch.net.RegexURLFilter.filter
(RegexURLFilter.java:114)
- locked <0x753d8cc8> (a org.apache.nutch.net.RegexURLFilter)
at org.apache.nutch.net.URLFilters.filter(URLFilters.java:77)
at org.apache.nutch.crawl.ParseOutputFormat$1.write
(ParseOutputFormat.java:71)
at org.apache.nutch.crawl.FetcherOutputFormat$1.write
(FetcherOutputFormat.java:78)
at org.apache.nutch.mapred.ReduceTask$2.collect
(ReduceTask.java:247)
at org.apache.nutch.mapred.lib.IdentityReducer.reduce
(IdentityReducer.java:41)
at org.apache.nutch.mapred.ReduceTask.run(ReduceTask.java:260)
at org.apache.nutch.mapred.LocalJobRunner$Job.run
(LocalJobRunner.java:90)
"Signal Dispatcher" daemon prio=1 tid=0x080a6ff8 nid=0x7645 waiting
on condition [0..0]
"Finalizer" daemon prio=1 tid=0x080933e8 nid=0x7645 in Object.wait()
[70159000..70159868]
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
- locked <0x75350780> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127)
at java.lang.ref.Finalizer$FinalizerThread.run
(Finalizer.java:159)
"Reference Handler" daemon prio=1 tid=0x08091978 nid=0x7645 in
Object.wait() [701da000..701da868]
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:429)
at java.lang.ref.Reference$ReferenceHandler.run
(Reference.java:115)
- locked <0x753507e8> (a java.lang.ref.Reference$Lock)
"main" prio=1 tid=0x0805c0d8 nid=0x7645 waiting on condition
[bfffb000..bfffb41c]
at java.lang.Thread.sleep(Native Method)
at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:294)
at org.apache.nutch.crawl.Fetcher.fetch(Fetcher.java:333)
at org.apache.nutch.crawl.Fetcher.main(Fetcher.java:362)
"VM Thread" prio=1 tid=0x08090718 nid=0x7645 runnable
"VM Periodic Task Thread" prio=1 tid=0x6fb01420 nid=0x7645 waiting on
condition
"Suspend Checker Thread" prio=1 tid=0x080a65f0 nid=0x7645 runnable
Re: RegexUrlFilter hangs up
Posted by Marko Bauhardt <mb...@media-style.com>.
Am 18.10.2005 um 17:55 schrieb Doug Cutting:
> What makes you think that the fetcher is hung?
The entries in the logfile and the size of the segment didn't grow
up. I was waiting about 8hours, but the last entry of my logfile is
still '051017 151123 reduce > reduce'.
I use mapred on local fs .
Re: RegexUrlFilter hangs up
Posted by Doug Cutting <cu...@nutch.org>.
What makes you think that the fetcher is hung?
Doug
Marko Bauhardt wrote:
> Hi all,
> I use nutch-mapred from the svn-branch. Sometimes the reduce job of the
> fetchprocess hangs up. The CoreDump prints out that the RegexUrlFilter
> is in work.
> In the regex-urlfilter.txt i uncommented the line
> #-[?*!@=]
>
> because I want to fetch dynamic urls like jsp's.
>
>
>
> Here is the CoreDump.
>
> 051017 151123 reduce > reduce
> Full thread dump Java HotSpot(TM) Client VM (1.4.2_08-b03 mixed mode):
>
> "MultiThreadedHttpConnectionManager cleanup" daemon prio=1
> tid=0x08249fa0 nid=0x7645 in Object.wait() [6d489000..6d489868]
> at java.lang.Object.wait(Native Method)
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
> - locked <0x753a19c0> (a java.lang.ref.ReferenceQueue$Lock)
> at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager
> $ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1100)
>
> "Thread-1" prio=1 tid=0x082149b0 nid=0x7645 runnable [6efc3000..6efc3868]
> at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
> at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
> at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
> at org.apache.oro.text.regex.Perl5Matcher.__tryExpression
> (Unknown Source)
> at org.apache.oro.text.regex.Perl5Matcher.__interpret (Unknown
> Source)
> at org.apache.oro.text.regex.Perl5Matcher.contains(Unknown Source)
> at org.apache.oro.text.regex.Perl5Matcher.contains(Unknown Source)
> at org.apache.nutch.net.RegexURLFilter.filter
> (RegexURLFilter.java:114)
> - locked <0x753d8cc8> (a org.apache.nutch.net.RegexURLFilter)
> at org.apache.nutch.net.URLFilters.filter(URLFilters.java:77)
> at org.apache.nutch.crawl.ParseOutputFormat$1.write
> (ParseOutputFormat.java:71)
> at org.apache.nutch.crawl.FetcherOutputFormat$1.write
> (FetcherOutputFormat.java:78)
> at org.apache.nutch.mapred.ReduceTask$2.collect
> (ReduceTask.java:247)
> at org.apache.nutch.mapred.lib.IdentityReducer.reduce
> (IdentityReducer.java:41)
> at org.apache.nutch.mapred.ReduceTask.run(ReduceTask.java:260)
> at org.apache.nutch.mapred.LocalJobRunner$Job.run
> (LocalJobRunner.java:90)
>
> "Signal Dispatcher" daemon prio=1 tid=0x080a6ff8 nid=0x7645 waiting on
> condition [0..0]
>
> "Finalizer" daemon prio=1 tid=0x080933e8 nid=0x7645 in Object.wait()
> [70159000..70159868]
> at java.lang.Object.wait(Native Method)
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
> - locked <0x75350780> (a java.lang.ref.ReferenceQueue$Lock)
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127)
> at java.lang.ref.Finalizer$FinalizerThread.run (Finalizer.java:159)
>
> "Reference Handler" daemon prio=1 tid=0x08091978 nid=0x7645 in
> Object.wait() [701da000..701da868]
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:429)
> at java.lang.ref.Reference$ReferenceHandler.run
> (Reference.java:115)
> - locked <0x753507e8> (a java.lang.ref.Reference$Lock)
>
> "main" prio=1 tid=0x0805c0d8 nid=0x7645 waiting on condition
> [bfffb000..bfffb41c]
> at java.lang.Thread.sleep(Native Method)
> at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:294)
> at org.apache.nutch.crawl.Fetcher.fetch(Fetcher.java:333)
> at org.apache.nutch.crawl.Fetcher.main(Fetcher.java:362)
>
> "VM Thread" prio=1 tid=0x08090718 nid=0x7645 runnable
>
> "VM Periodic Task Thread" prio=1 tid=0x6fb01420 nid=0x7645 waiting on
> condition
> "Suspend Checker Thread" prio=1 tid=0x080a65f0 nid=0x7645 runnable
>
>