You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Jacob Brunson (JIRA)" <ji...@apache.org> on 2006/08/10 06:22:16 UTC

[jira] Commented: (NUTCH-344) Fetcher threads blocked on synchronized block in cleanExpiredServerBlocks

    [ http://issues.apache.org/jira/browse/NUTCH-344?page=comments#action_12427096 ] 
            
Jacob Brunson commented on NUTCH-344:
-------------------------------------

I'm having problems with the patch committed in revision #429779.  I used to be having the "fetch aborted with X hung threads" problem.  After updating to this revision, fetching goes fine for a while, but then I get this error on just about every page fetch attempt:
2006-08-09 23:27:28,548 INFO  fetcher.Fetcher - fetching http://www.xmission.com/~nelsonb/resources.htm
2006-08-09 23:27:28,549 ERROR http.Http - java.lang.NullPointerException
2006-08-09 23:27:28,549 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.cleanExpiredServerBlocks(HttpBase.java:382)
2006-08-09 23:27:28,549 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.blockAddr(HttpBase.java:323)
2006-08-09 23:27:28,549 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:188)
2006-08-09 23:27:28,549 ERROR http.Http - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:144)
2006-08-09 23:27:28,549 INFO  fetcher.Fetcher - fetch of http://www.xmission.com/~nelsonb/resources.htm failed with: java.lang.NullPointerException


> Fetcher threads blocked on synchronized block in cleanExpiredServerBlocks
> -------------------------------------------------------------------------
>
>                 Key: NUTCH-344
>                 URL: http://issues.apache.org/jira/browse/NUTCH-344
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 0.8
>         Environment: All
>            Reporter: Greg Kim
>             Fix For: 0.8.1, 0.9.0
>
>         Attachments: cleanExpiredServerBlocks.patch, HttpBase.patch
>
>
> With the recent change to the following code in HttpBase.java has tendencies to block fetcher threads while one thread busy waits... 
>   private static void cleanExpiredServerBlocks() {
>     synchronized (BLOCKED_ADDR_TO_TIME) {
>       while (!BLOCKED_ADDR_QUEUE.isEmpty()) {   <===== LINE 3:   
>         String host = (String) BLOCKED_ADDR_QUEUE.getLast();
>         long time = ((Long) BLOCKED_ADDR_TO_TIME.get(host)).longValue();
>         if (time <= System.currentTimeMillis()) {   
>           BLOCKED_ADDR_TO_TIME.remove(host);
>           BLOCKED_ADDR_QUEUE.removeLast();
>         }
>       }
>     }
>   }
> LINE3:  As long as there are *any* entries in the BLOCKED_ADDR_QUEUE, the thread that first enters this block busy-waits until it becomes empty while all other threads block on the synchronized block.  This leads to extremely poor fetcher performance.  
> Since the checkin to respect crawlDelay in robots.txt, we are no longer guranteed that BLOCKED_ADDR_TO_TIME queue is a fifo list. The simple fix is to iterate the queue once rather than busy waiting...

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira