You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jon Shoberg <jo...@shoberg.net> on 2005/09/27 02:26:04 UTC

fetcher hangs and thead lifetime

   Is there a way to set the lifetime of a fetching thread?  As in if it 
can not complete the entire fetching process in X minutes to gracefully 
give up?

   Anyone else experience the fetcher hanging for a long period of time 
(hour+)?  I'm using 100 threads, 30 per host.  I'm guessing that I have 
one host which it is "stuck" on.

-j

*** Uptime and CPU *****************************************************

[jon@crawlr~]$ uptime
  00:25:14 up 4 days,  5:24,  4 users,  load average: 10.94, 10.82, 10.07

*** Thread Dump ********************************************************

         Full thread dump Java HotSpot(TM) 64-Bit Server VM 
(1.5.0_04-b05 mixed mode):

"fetcher124" prio=1 tid=0x00002aabb5504cf0 nid=0x541e runnable 
[0x0000000048a42000..0x0000000048a42b30]
         at java.lang.String.<init>(String.java:208)
         at java.lang.StringBuffer.toString(StringBuffer.java:586)
         - locked <0x00002aab226bdaf8> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher117" prio=1 tid=0x00002aabb4a08e00 nid=0x5415 runnable 
[0x000000004833b000..0x000000004833beb0]
         at java.lang.String.<init>(String.java:208)
         at java.lang.StringBuffer.toString(StringBuffer.java:586)
         - locked <0x00002aab226bdb40> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher112" prio=1 tid=0x00002aabb4a03810 nid=0x5410 runnable 
[0x0000000047e36000..0x0000000047e36d30]
         at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
         at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393)
         at java.lang.StringBuffer.append(StringBuffer.java:225)
         - locked <0x00002aab880a56d0> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher106" prio=1 tid=0x00002aabb4d7d720 nid=0x540a runnable 
[0x0000000047830000..0x0000000047830c30]
         at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
         at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393)
         at java.lang.StringBuffer.append(StringBuffer.java:225)
         - locked <0x00002aab1f412728> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher101" prio=1 tid=0x00002aabb4242b30 nid=0x5405 runnable 
[0x000000004732b000..0x000000004732beb0]
         at java.lang.String.<init>(String.java:208)
         at java.lang.StringBuffer.toString(StringBuffer.java:586)
         - locked <0x00002aaba26d80c8> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher100" prio=1 tid=0x00002aabb4241a00 nid=0x5404 runnable 
[0x000000004722a000..0x000000004722ab30]
         at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
         at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393)
         at java.lang.StringBuffer.append(StringBuffer.java:225)
         - locked <0x00002aab8d3359d0> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher97" prio=1 tid=0x00002aabb423e670 nid=0x5401 runnable 
[0x0000000046f27000..0x0000000046f27cb0]
         at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
         at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393)
         at java.lang.StringBuffer.append(StringBuffer.java:225)
         - locked <0x00002aab9057f610> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher96" prio=1 tid=0x00002aabb423d540 nid=0x5400 runnable 
[0x0000000046e26000..0x0000000046e26d30]
         at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
         at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393)
         at java.lang.StringBuffer.append(StringBuffer.java:225)
         - locked <0x00002aab226bdcd0> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher94" prio=1 tid=0x00002aabb42ea7a0 nid=0x53fe runnable 
[0x0000000046c24000..0x0000000046c24e30]
         at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
         at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393)
         at java.lang.StringBuffer.append(StringBuffer.java:225)
         - locked <0x00002aab98eeeaf0> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher91" prio=1 tid=0x00002aabb42e7410 nid=0x53fb runnable 
[0x0000000046921000..0x0000000046921bb0]
         at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
         at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393)
         at java.lang.StringBuffer.append(StringBuffer.java:225)
         - locked <0x00002aab226bdd68> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher81" prio=1 tid=0x00002aabb4a379c0 nid=0x53f1 runnable 
[0x0000000045f17000..0x0000000045f17cb0]
         at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
         at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393)
         at java.lang.StringBuffer.append(StringBuffer.java:225)
         - locked <0x00002aab8fd6e840> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher80" prio=1 tid=0x00002aabb4a36890 nid=0x53f0 runnable 
[0x0000000045e16000..0x0000000045e16d30]
         at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
         at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393)
         at java.lang.StringBuffer.append(StringBuffer.java:225)
         - locked <0x00002aaba26d0168> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher79" prio=1 tid=0x00002aabb4a35760 nid=0x53ef runnable 
[0x0000000045d15000..0x0000000045d15db0]
         at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
         at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393)
         at java.lang.StringBuffer.append(StringBuffer.java:225)
         - locked <0x00002aab941bbfe8> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher73" prio=1 tid=0x00002aabb5179040 nid=0x53e9 runnable 
[0x000000004570f000..0x000000004570fcb0]
         at 
org.apache.xerces.dom.CharacterDataImpl.setNodeValueInternal(Unknown Source)
         at org.apache.xerces.dom.CharacterDataImpl.setNodeValue(Unknown 
Source)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher66" prio=1 tid=0x00002aabb518a3f0 nid=0x53e2 runnable 
[0x0000000045008000..0x0000000045008c30]
         at java.lang.String.<init>(String.java:208)
         at java.lang.StringBuffer.toString(StringBuffer.java:586)
         - locked <0x00002aab85b2b0a8> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher65" prio=1 tid=0x00002aabb56d2660 nid=0x53e1 runnable 
[0x0000000044f07000..0x0000000044f07cb0]
         at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
         at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393)
         at java.lang.StringBuffer.append(StringBuffer.java:225)
         - locked <0x00002aaba26d0240> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher61" prio=1 tid=0x00002aabb0d5b7c0 nid=0x53dd runnable 
[0x0000000044b03000..0x0000000044b03eb0]
         at java.lang.String.<init>(String.java:208)
         at java.lang.StringBuffer.toString(StringBuffer.java:586)
         - locked <0x00002aab9cc4a970> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher54" prio=1 tid=0x00002aabb0d44ad0 nid=0x53d6 runnable 
[0x00000000443fc000..0x00000000443fce30]
         at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
         at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393)
         at java.lang.StringBuffer.append(StringBuffer.java:225)
         - locked <0x00002aab226bbd40> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher51" prio=1 tid=0x00002aabb4243760 nid=0x53d3 runnable 
[0x00000000440f9000..0x00000000440f9bb0]
         at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
         at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393)
         at java.lang.StringBuffer.append(StringBuffer.java:225)
         - locked <0x00002aab923fdee0> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher44" prio=1 tid=0x00002aabb421ce60 nid=0x53cc runnable 
[0x00000000439f2000..0x00000000439f2b30]
         at java.lang.String.<init>(String.java:208)
         at java.lang.StringBuffer.toString(StringBuffer.java:586)
         - locked <0x00002aab8eb471c8> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher42" prio=1 tid=0x00002aabb4a12a10 nid=0x53ca runnable 
[0x00000000437f0000..0x00000000437f0c30]
         at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
         at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393)
         at java.lang.StringBuffer.append(StringBuffer.java:225)
         - locked <0x00002aaba26d8300> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher38" prio=1 tid=0x00002aabb4c618d0 nid=0x53c6 runnable 
[0x00000000433ec000..0x00000000433ece30]
         at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
         at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393)
         at java.lang.StringBuffer.append(StringBuffer.java:225)
         - locked <0x00002aab9c521590> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher31" prio=1 tid=0x00002aabb46dfb30 nid=0x53bf runnable 
[0x0000000042ce5000..0x0000000042ce5db0]
         at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
         at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393)
         at java.lang.StringBuffer.append(StringBuffer.java:225)
         - locked <0x00002aab1f414868> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher30" prio=1 tid=0x00002aabb46d6710 nid=0x53be runnable 
[0x0000000042be4000..0x0000000042be4e30]
         at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
         at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393)
         at java.lang.StringBuffer.append(StringBuffer.java:225)
         - locked <0x00002aab226bbe68> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher21" prio=1 tid=0x00002aabb4667910 nid=0x53b5 runnable 
[0x00000000422db000..0x00000000422dbeb0]
         at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
         at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393)
         at java.lang.StringBuffer.append(StringBuffer.java:225)
         - locked <0x00002aaba26d04e0> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher9" prio=1 tid=0x00002aabb4e18bb0 nid=0x53a9 runnable 
[0x00000000416cf000..0x00000000416cfcb0]
         at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
         at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393)
         at java.lang.StringBuffer.append(StringBuffer.java:225)
         - locked <0x00002aab90bf8db8> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"fetcher2" prio=1 tid=0x00002aabb4919c40 nid=0x53a2 runnable 
[0x0000000040fc8000..0x0000000040fc8c30]
         at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
         at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393)
         at java.lang.StringBuffer.append(StringBuffer.java:225)
         - locked <0x00002aaba26d8428> (a java.lang.StringBuffer)
         at org.apache.xerces.dom.CharacterDataImpl.appendData(Unknown 
Source)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.characters(DOMFragmentParser.java:463)
         at 
org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
         at 
org.cyberneko.html.HTMLTagBalancer.characters(HTMLTagBalancer.java:821)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:1972)
         at 
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1775)
         at 
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
         at 
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
         at 
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
         at 
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
         at 
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:213)
         at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:156)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:254)
         at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

"Low Memory Detector" daemon prio=1 tid=0x00002aabb3d03bd0 nid=0x539e 
runnable [0x0000000000000000..0x0000000000000000]

"CompilerThread1" daemon prio=1 tid=0x00002aabb3d02110 nid=0x539d 
waiting on condition [0x0000000000000000..0x0000000040ac27d0]

"CompilerThread0" daemon prio=1 tid=0x00002aabb3d00d10 nid=0x539c 
waiting on condition [0x0000000000000000..0x00000000409c1450]

"AdapterThread" daemon prio=1 tid=0x00002aabb0cd8d30 nid=0x539b waiting 
on condition [0x0000000000000000..0x0000000000000000]

"Signal Dispatcher" daemon prio=1 tid=0x00002aabb0cd7990 nid=0x539a 
runnable [0x0000000000000000..0x0000000000000000]

"Finalizer" daemon prio=1 tid=0x00002aabb0cc5480 nid=0x5399 in 
Object.wait() [0x00000000406bf000..0x00000000406bfcb0]
         at java.lang.Object.wait(Native Method)
         - waiting on <0x00002aaaf4b9c3b0> (a 
java.lang.ref.ReferenceQueue$Lock)
         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
         - locked <0x00002aaaf4b9c3b0> (a java.lang.ref.ReferenceQueue$Lock)
         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
         at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=1 tid=0x00002aabb0cc22e0 nid=0x5398 in 
Object.wait() [0x00000000405be000..0x00000000405bed30]
         at java.lang.Object.wait(Native Method)
         - waiting on <0x00002aaaf4bc37d0> (a java.lang.ref.Reference$Lock)
         at java.lang.Object.wait(Object.java:474)
         at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
         - locked <0x00002aaaf4bc37d0> (a java.lang.ref.Reference$Lock)

"main" prio=1 tid=0x0000000040115bc0 nid=0x5390 waiting on condition 
[0x00007fffffa88000..0x00007fffffa88830]
         at java.lang.Thread.sleep(Native Method)
         at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:351)
         at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:488)

"VM Thread" prio=1 tid=0x00000000401bab30 nid=0x5397 runnable

"GC task thread#0 (ParallelGC)" prio=1 tid=0x00000000401a85c0 nid=0x5395 
runnable

"GC task thread#1 (ParallelGC)" prio=1 tid=0x00000000401a8c60 nid=0x5396 
runnable

"VM Periodic Task Thread" prio=1 tid=0x00002aabb3d05980 nid=0x539f 
waiting on condition

Re: Map Reduce

Posted by Jack Tang <hi...@gmail.com>.
Hi Gal

You can get the orignal paper from google labs
              http://labs.google.com/papers/mapreduce.html
and some presentations in nutch wiki
              http://wiki.apache.org/nutch/Presentations

Hope these resources help.

Regards
/Jack

On 9/27/05, Gal Nitzan <gn...@usa.net> wrote:
> Hi,
>
> Can someone please refer me to some info on map reduce, or describe it a
> little?
>
> Thanks,
>
> Gal
>


--
Keep Discovering ... ...
http://www.jroller.com/page/jmars

Map Reduce

Posted by Gal Nitzan <gn...@usa.net>.
Hi,

Can someone please refer me to some info on map reduce, or describe it a 
little?

Thanks,

Gal

Re: fetcher hangs and thead lifetime

Posted by Jon Shoberg <jo...@shoberg.net>.
 > Jon Shoberg wrote:
 >>
 >>   Is there a way to set the lifetime of a fetching thread?  As in if
 >> it can not complete the entire fetching process in X minutes to
 >> gracefully give up?
 >>
 >>   Anyone else experience the fetcher hanging for a long period of time
 >> (hour+)?  I'm using 100 threads, 30 per host.  I'm guessing that I
 >> have one host which it is "stuck" on.

 > Paul van Brouwershaven wrote:
> Helle Jon,
> 
> I have the same problem here, the fetcher get stuck aftyher running a 
> few hours.
> 
> How do you get a good crawler if you everytime must repair the database 
> and start again?
> 

I run on stable hardware so I run everything within a screen process 
which allows me to interactivly watch whats going on.  I'm in the 
testing phases of a nutch implementation so I pay close attention to it.

My request to experienced nutch users / developers:

The wiki has good info.  It would be helful to hear about people's 
small, medium, and large implementations.  What configurations are used? 
What tweaks to the conf files? What are performance bottle necks? 
Common implementation problems and how to fix.  How have you allowed for 
dynamic URLs (question marks)?

I'd be willing to aggregate input to wiki entries.

For myself, I'm running a crawling script inside a SCREEN process. This 
allows me to SSH in and see whats going on at the console and gracesully 
exit the session.  If I don't like a crawling session I'll CTRL-C it and 
let the script keep going.

The perl script generates a segment with -numFetchers and starts calling 
the fetcher via a system call.

-j


Re: fetcher hangs and thead lifetime

Posted by Paul van Brouwershaven <pa...@vanbrouwershaven.com>.
Helle Jon,

I have the same problem here, the fetcher get stuck aftyher running a few 
hours.

How do you get a good crawler if you everytime must repair the database 
and start again?

Jon Shoberg wrote:
> 
>   Is there a way to set the lifetime of a fetching thread?  As in if it 
> can not complete the entire fetching process in X minutes to gracefully 
> give up?
> 
>   Anyone else experience the fetcher hanging for a long period of time 
> (hour+)?  I'm using 100 threads, 30 per host.  I'm guessing that I have 
> one host which it is "stuck" on.