You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Julien Nioche (JIRA)" <ji...@apache.org> on 2009/03/12 15:42:50 UTC

[jira] Created: (NUTCH-719) fetchQueues.totalSize incorrect in Fetcher2

fetchQueues.totalSize incorrect in Fetcher2
-------------------------------------------

                 Key: NUTCH-719
                 URL: https://issues.apache.org/jira/browse/NUTCH-719
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: 1.0.0
            Reporter: Julien Nioche


I had a look at the logs generated by Fetcher2 and found cases where there were no active fetchQueues but fetchQueues.totalSize was != 0

fetcher.Fetcher2 - -activeThreads=200, spinWaiting=200, fetchQueues.totalSize=1, fetchQueues=0

since the code relies on fetchQueues.totalSize to determine whether the work is finished or not the task is blocked until the abortion mechanism kicks in

2009-03-12 09:27:38,977 WARN  fetcher.Fetcher2 - Aborting with 200 hung threads.

could that be a synchronisation issue? any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-719) fetchQueues.totalSize incorrect in Fetcher2

Posted by "Euan Clark (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836918#action_12836918 ] 

Euan Clark commented on NUTCH-719:
----------------------------------

I notice the other addFetchItem method of FetchItemQueues  and FetchItemQueue in Fetcher.java should these also be synchronized?

> fetchQueues.totalSize incorrect in Fetcher2
> -------------------------------------------
>
>                 Key: NUTCH-719
>                 URL: https://issues.apache.org/jira/browse/NUTCH-719
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.0.0
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>             Fix For: 1.1
>
>
> I had a look at the logs generated by Fetcher2 and found cases where there were no active fetchQueues but fetchQueues.totalSize was != 0
> fetcher.Fetcher2 - -activeThreads=200, spinWaiting=200, fetchQueues.totalSize=1, fetchQueues=0
> since the code relies on fetchQueues.totalSize to determine whether the work is finished or not the task is blocked until the abortion mechanism kicks in
> 2009-03-12 09:27:38,977 WARN  fetcher.Fetcher2 - Aborting with 200 hung threads.
> could that be a synchronisation issue? any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-719) fetchQueues.totalSize incorrect in Fetcher2

Posted by "Doğacan Güney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729833#action_12729833 ] 

Doğacan Güney commented on NUTCH-719:
-------------------------------------

Thanks for looking into this bug.

I wonder if this is the cause of the performance problem so many people are facing with Fetcher in nutch-1.0. Can it be that QueueFeeder stops feeding new URLs into FetchQueues because of this bug?

> fetchQueues.totalSize incorrect in Fetcher2
> -------------------------------------------
>
>                 Key: NUTCH-719
>                 URL: https://issues.apache.org/jira/browse/NUTCH-719
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.0.0
>            Reporter: Julien Nioche
>
> I had a look at the logs generated by Fetcher2 and found cases where there were no active fetchQueues but fetchQueues.totalSize was != 0
> fetcher.Fetcher2 - -activeThreads=200, spinWaiting=200, fetchQueues.totalSize=1, fetchQueues=0
> since the code relies on fetchQueues.totalSize to determine whether the work is finished or not the task is blocked until the abortion mechanism kicks in
> 2009-03-12 09:27:38,977 WARN  fetcher.Fetcher2 - Aborting with 200 hung threads.
> could that be a synchronisation issue? any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (NUTCH-719) fetchQueues.totalSize incorrect in Fetcher2

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julien Nioche reassigned NUTCH-719:
-----------------------------------

    Assignee: Julien Nioche

> fetchQueues.totalSize incorrect in Fetcher2
> -------------------------------------------
>
>                 Key: NUTCH-719
>                 URL: https://issues.apache.org/jira/browse/NUTCH-719
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.0.0
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>
> I had a look at the logs generated by Fetcher2 and found cases where there were no active fetchQueues but fetchQueues.totalSize was != 0
> fetcher.Fetcher2 - -activeThreads=200, spinWaiting=200, fetchQueues.totalSize=1, fetchQueues=0
> since the code relies on fetchQueues.totalSize to determine whether the work is finished or not the task is blocked until the abortion mechanism kicks in
> 2009-03-12 09:27:38,977 WARN  fetcher.Fetcher2 - Aborting with 200 hung threads.
> could that be a synchronisation issue? any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-719) fetchQueues.totalSize incorrect in Fetcher2

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836125#action_12836125 ] 

Hudson commented on NUTCH-719:
------------------------------

Integrated in Nutch-trunk #1074 (See [http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1074/])
     fetchQueues.totalSize incorrect in Fetcher


> fetchQueues.totalSize incorrect in Fetcher2
> -------------------------------------------
>
>                 Key: NUTCH-719
>                 URL: https://issues.apache.org/jira/browse/NUTCH-719
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.0.0
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>             Fix For: 1.1
>
>
> I had a look at the logs generated by Fetcher2 and found cases where there were no active fetchQueues but fetchQueues.totalSize was != 0
> fetcher.Fetcher2 - -activeThreads=200, spinWaiting=200, fetchQueues.totalSize=1, fetchQueues=0
> since the code relies on fetchQueues.totalSize to determine whether the work is finished or not the task is blocked until the abortion mechanism kicks in
> 2009-03-12 09:27:38,977 WARN  fetcher.Fetcher2 - Aborting with 200 hung threads.
> could that be a synchronisation issue? any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (NUTCH-719) fetchQueues.totalSize incorrect in Fetcher2

Posted by "Steven Denny (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729254#action_12729254 ] 

Steven Denny edited comment on NUTCH-719 at 7/9/09 6:17 AM:
------------------------------------------------------------

I've done some investigation on this.

It looks to me as if queues can get reaped to early. I've put in some debug and this is what I see:

2009-07-09 04:39:50,704 DEBUG fetcher.Fetcher -   FetchItemQueue::getFetchItemQueue() id=http://125.168.254.20
2009-07-09 04:39:50,704 DEBUG fetcher.Fetcher - Created queue: http://125.168.254.20
....
2009-07-09 04:39:50,704 DEBUG fetcher.Fetcher - reaping: http://125.168.254.20
.....
2009-07-09 04:39:50,705 DEBUG fetcher.Fetcher - addFetchItem: adding item - http://www.callidan.com/ma100.htm
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - totalSize++:2 http://125.168.254.20 http://www.callidan.com/ma100.htm queuesize: 1 queuecount: 11
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://61.9.216.193, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://216.184.34.250, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://139.146.150.23, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://203.29.78.68, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://150.101.91.39, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://209.212.110.211, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://123.176.112.44, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://117.104.160.130, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://196.25.73.205, size: 0
2009-07-09 04:39:50,884 DEBUG fetcher.Fetcher - * queue: http://202.53.7.145, size: 0
2009-07-09 04:39:50,884 DEBUG fetcher.Fetcher - * queue: http://202.60.67.145, size: 1

Note that the queue is created and then immediately reaped, and after totalSize is incremented, that queue does not appear in the list, even though it supposedly has the item added to it.

The upshot is that the url is never fetched (as the queue has gone) so totalSize never = 0, and eventually the abort will happen.

In short I'd say this is a sync issue, but I'm not sure where the best place to lock would be.

Any comments from the author?


      was (Author: stevedenny):
    I've done some investigation on this.

It looks to me as if queues can get reaped to early. I've put in some debug and this is what I see:

2009-07-09 04:39:50,704 DEBUG fetcher.Fetcher -   FetchItemQueue::getFetchItemQueue() id=http://125.168.254.20
2009-07-09 04:39:50,704 DEBUG fetcher.Fetcher - Created queue: http://125.168.254.20
....
2009-07-09 04:39:50,704 DEBUG fetcher.Fetcher - reaping: http://125.168.254.20
.....
2009-07-09 04:39:50,705 DEBUG fetcher.Fetcher - addFetchItem: adding item - http://www.callidan.com/ma100.htm
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - totalSize++:2 http://125.168.254.20 http://www.callidan.com/ma100.htm queuesize: 1 queuecount: 11
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://61.9.216.193, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://216.184.34.250, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://139.146.150.23, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://203.29.78.68, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://150.101.91.39, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://209.212.110.211, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://123.176.112.44, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://117.104.160.130, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://196.25.73.205, size: 0
2009-07-09 04:39:50,884 DEBUG fetcher.Fetcher - * queue: http://202.53.7.145, size: 0
2009-07-09 04:39:50,884 DEBUG fetcher.Fetcher - * queue: http://202.60.67.145, size: 1

Not that the queue is created and then immediately reaped, and after totalSize is incremented, that queue does not appear in the list, even though it supposedly has the item added to it.
It looks as if when items are fed, there's a posibility of the queue being reaped before the item is added to the queue. However, totalSize is still incrememented.

The upshot is that the url is never fetched (as the queue has gone) so totalSize never = 0, and eventually the abort will happen.

In short I'd say this is a sync issue, but I'm not sure where the best place to lock would be.

Any comments from the author?

  
> fetchQueues.totalSize incorrect in Fetcher2
> -------------------------------------------
>
>                 Key: NUTCH-719
>                 URL: https://issues.apache.org/jira/browse/NUTCH-719
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.0.0
>            Reporter: Julien Nioche
>
> I had a look at the logs generated by Fetcher2 and found cases where there were no active fetchQueues but fetchQueues.totalSize was != 0
> fetcher.Fetcher2 - -activeThreads=200, spinWaiting=200, fetchQueues.totalSize=1, fetchQueues=0
> since the code relies on fetchQueues.totalSize to determine whether the work is finished or not the task is blocked until the abortion mechanism kicks in
> 2009-03-12 09:27:38,977 WARN  fetcher.Fetcher2 - Aborting with 200 hung threads.
> could that be a synchronisation issue? any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-719) fetchQueues.totalSize incorrect in Fetcher2

Posted by "Steven Denny (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729699#action_12729699 ] 

Steven Denny commented on NUTCH-719:
------------------------------------

I've changed line 324 of src/java/org/apache/nutch/fetcher/Fetcher.java to 

public void synchronized void addFetchItem(FetchItem it) {

(added the synchronized) and initial testing looks good.



> fetchQueues.totalSize incorrect in Fetcher2
> -------------------------------------------
>
>                 Key: NUTCH-719
>                 URL: https://issues.apache.org/jira/browse/NUTCH-719
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.0.0
>            Reporter: Julien Nioche
>
> I had a look at the logs generated by Fetcher2 and found cases where there were no active fetchQueues but fetchQueues.totalSize was != 0
> fetcher.Fetcher2 - -activeThreads=200, spinWaiting=200, fetchQueues.totalSize=1, fetchQueues=0
> since the code relies on fetchQueues.totalSize to determine whether the work is finished or not the task is blocked until the abortion mechanism kicks in
> 2009-03-12 09:27:38,977 WARN  fetcher.Fetcher2 - Aborting with 200 hung threads.
> could that be a synchronisation issue? any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-719) fetchQueues.totalSize incorrect in Fetcher2

Posted by "Steven Denny (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730253#action_12730253 ] 

Steven Denny commented on NUTCH-719:
------------------------------------

perhaps i spoke too soon

10 threads, 15520 pages, 723 errors, 3.7 pages/s, 2972 kb/s, 
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=0, fetchQueues.count=0
Aborting with 10 hung threads.
Unable to resolve: www.countryenergy.com.au, skipping.
Exception in thread "QueueFeeder" java.lang.NullPointerException
	at org.apache.hadoop.fs.BufferedFSInputStream.getPos(BufferedFSInputStream.java:48)
	at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:41)
	at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:206)
	at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:238)
	at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:177)
	at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:111)
	at java.io.DataInputStream.readInt(DataInputStream.java:370)
	at org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1895)
	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1925)
	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2062)
	at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
	at org.apache.nutch.fetcher.Fetcher$QueueFeeder.run(Fetcher.java:418)


It apears that the feeder hung, but I'm not sure whether the exception raised is the cause or the effect (i suspect it's the effect of the thread aborting)

I'm also not sure if any of these issues are vm related. Hopefully our real hardware will turn up soon....

> fetchQueues.totalSize incorrect in Fetcher2
> -------------------------------------------
>
>                 Key: NUTCH-719
>                 URL: https://issues.apache.org/jira/browse/NUTCH-719
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.0.0
>            Reporter: Julien Nioche
>
> I had a look at the logs generated by Fetcher2 and found cases where there were no active fetchQueues but fetchQueues.totalSize was != 0
> fetcher.Fetcher2 - -activeThreads=200, spinWaiting=200, fetchQueues.totalSize=1, fetchQueues=0
> since the code relies on fetchQueues.totalSize to determine whether the work is finished or not the task is blocked until the abortion mechanism kicks in
> 2009-03-12 09:27:38,977 WARN  fetcher.Fetcher2 - Aborting with 200 hung threads.
> could that be a synchronisation issue? any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-719) fetchQueues.totalSize incorrect in Fetcher2

Posted by "Steven Denny (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730239#action_12730239 ] 

Steven Denny commented on NUTCH-719:
------------------------------------

I'm not sure, as far as I can tell, the feeder has always finished feeding the urls, it's just that a proportion are "lost".

However, there are two things I've noted re performance (if you just look at url's crawled per second)

1) When this situation arrises, the fetcher will time out and "Abort with N hung threads". The timeout occurs after "mapred.task.timeout"/2 or seconds (default 5 mins), so any timing on a crawl that aborted will be extended by 5 mins. One a small crawl this could skew the figures

2) DNS look up can take a while. I know this has been noted before, but on my test system (admittedly only a vm on our network, with nothing special in terms of DNS), some of the look ups were taking 5-6 seconds. THis is possibley the wrong place to discuss given NUTCH-721, but I put in some debug arround the feeder thread and got:

2009-07-10 04:01:35,296 INFO  fetcher.Fetcher - Fed 500 urls in 186 secs = 2.7url/s
2009-07-10 04:04:18,343 INFO  fetcher.Fetcher - Fed 499 urls in 163 secs = 3.1url/s
2009-07-10 04:06:57,109 INFO  fetcher.Fetcher - Fed 498 urls in 158 secs = 3.2url/s
2009-07-10 04:10:38,282 INFO  fetcher.Fetcher - Fed 499 urls in 221 secs = 2.3url/s
2009-07-10 04:12:58,371 INFO  fetcher.Fetcher - Fed 498 urls in 140 secs = 3.6url/s
2009-07-10 04:16:12,275 INFO  fetcher.Fetcher - Fed 499 urls in 193 secs = 2.6url/s
2009-07-10 04:19:20,162 INFO  fetcher.Fetcher - Fed 499 urls in 187 secs = 2.7url/s
2009-07-10 04:21:25,846 INFO  fetcher.Fetcher - Fed 499 urls in 125 secs = 4.0url/s
2009-07-10 04:24:16,049 INFO  fetcher.Fetcher - Fed 495 urls in 170 secs = 2.9url/s
2009-07-10 04:27:01,944 INFO  fetcher.Fetcher - Fed 499 urls in 165 secs = 3.0url/s
2009-07-10 04:29:26,247 INFO  fetcher.Fetcher - Fed 499 urls in 144 secs = 3.5url/s
2009-07-10 04:32:02,590 INFO  fetcher.Fetcher - Fed 499 urls in 156 secs = 3.2url/s
2009-07-10 04:34:49,985 INFO  fetcher.Fetcher - Fed 498 urls in 167 secs = 3.0url/s
2009-07-10 04:37:28,367 INFO  fetcher.Fetcher - Fed 498 urls in 158 secs = 3.2url/s
2009-07-10 04:40:09,865 INFO  fetcher.Fetcher - Fed 499 urls in 161 secs = 3.1url/s
2009-07-10 04:42:55,203 INFO  fetcher.Fetcher - Fed 499 urls in 165 secs = 3.0url/s

obviously when I'm only feeding 3-4 urls/sec, i'll only every be able to fetch that. That test was one a crawldb just initialised with 11,000 urls (unique sites).

However, on the next iteration where I'm feeding urls from non-unique sites, I see 5-7 times that rate.


> fetchQueues.totalSize incorrect in Fetcher2
> -------------------------------------------
>
>                 Key: NUTCH-719
>                 URL: https://issues.apache.org/jira/browse/NUTCH-719
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.0.0
>            Reporter: Julien Nioche
>
> I had a look at the logs generated by Fetcher2 and found cases where there were no active fetchQueues but fetchQueues.totalSize was != 0
> fetcher.Fetcher2 - -activeThreads=200, spinWaiting=200, fetchQueues.totalSize=1, fetchQueues=0
> since the code relies on fetchQueues.totalSize to determine whether the work is finished or not the task is blocked until the abortion mechanism kicks in
> 2009-03-12 09:27:38,977 WARN  fetcher.Fetcher2 - Aborting with 200 hung threads.
> could that be a synchronisation issue? any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-719) fetchQueues.totalSize incorrect in Fetcher2

Posted by "Steven Denny (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729254#action_12729254 ] 

Steven Denny commented on NUTCH-719:
------------------------------------

I've done some investigation on this.

It looks to me as if queues can get reaped to early. I've put in some debug and this is what I see:

2009-07-09 04:39:50,704 DEBUG fetcher.Fetcher -   FetchItemQueue::getFetchItemQueue() id=http://125.168.254.20
2009-07-09 04:39:50,704 DEBUG fetcher.Fetcher - Created queue: http://125.168.254.20
....
2009-07-09 04:39:50,704 DEBUG fetcher.Fetcher - reaping: http://125.168.254.20
.....
2009-07-09 04:39:50,705 DEBUG fetcher.Fetcher - addFetchItem: adding item - http://www.callidan.com/ma100.htm
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - totalSize++:2 http://125.168.254.20 http://www.callidan.com/ma100.htm queuesize: 1 queuecount: 11
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://61.9.216.193, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://216.184.34.250, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://139.146.150.23, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://203.29.78.68, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://150.101.91.39, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://209.212.110.211, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://123.176.112.44, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://117.104.160.130, size: 0
2009-07-09 04:39:50,883 DEBUG fetcher.Fetcher - * queue: http://196.25.73.205, size: 0
2009-07-09 04:39:50,884 DEBUG fetcher.Fetcher - * queue: http://202.53.7.145, size: 0
2009-07-09 04:39:50,884 DEBUG fetcher.Fetcher - * queue: http://202.60.67.145, size: 1

Not that the queue is created and then immediately reaped, and after totalSize is incremented, that queue does not appear in the list, even though it supposedly has the item added to it.
It looks as if when items are fed, there's a posibility of the queue being reaped before the item is added to the queue. However, totalSize is still incrememented.

The upshot is that the url is never fetched (as the queue has gone) so totalSize never = 0, and eventually the abort will happen.

In short I'd say this is a sync issue, but I'm not sure where the best place to lock would be.

Any comments from the author?


> fetchQueues.totalSize incorrect in Fetcher2
> -------------------------------------------
>
>                 Key: NUTCH-719
>                 URL: https://issues.apache.org/jira/browse/NUTCH-719
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.0.0
>            Reporter: Julien Nioche
>
> I had a look at the logs generated by Fetcher2 and found cases where there were no active fetchQueues but fetchQueues.totalSize was != 0
> fetcher.Fetcher2 - -activeThreads=200, spinWaiting=200, fetchQueues.totalSize=1, fetchQueues=0
> since the code relies on fetchQueues.totalSize to determine whether the work is finished or not the task is blocked until the abortion mechanism kicks in
> 2009-03-12 09:27:38,977 WARN  fetcher.Fetcher2 - Aborting with 200 hung threads.
> could that be a synchronisation issue? any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (NUTCH-719) fetchQueues.totalSize incorrect in Fetcher2

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julien Nioche resolved NUTCH-719.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 1.1

Committed revision 911905.
Thanks to S. Dennis for investigating the issue + R. Schwab for testing it 

> fetchQueues.totalSize incorrect in Fetcher2
> -------------------------------------------
>
>                 Key: NUTCH-719
>                 URL: https://issues.apache.org/jira/browse/NUTCH-719
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.0.0
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>             Fix For: 1.1
>
>
> I had a look at the logs generated by Fetcher2 and found cases where there were no active fetchQueues but fetchQueues.totalSize was != 0
> fetcher.Fetcher2 - -activeThreads=200, spinWaiting=200, fetchQueues.totalSize=1, fetchQueues=0
> since the code relies on fetchQueues.totalSize to determine whether the work is finished or not the task is blocked until the abortion mechanism kicks in
> 2009-03-12 09:27:38,977 WARN  fetcher.Fetcher2 - Aborting with 200 hung threads.
> could that be a synchronisation issue? any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (NUTCH-719) fetchQueues.totalSize incorrect in Fetcher2

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julien Nioche closed NUTCH-719.
-------------------------------


> fetchQueues.totalSize incorrect in Fetcher2
> -------------------------------------------
>
>                 Key: NUTCH-719
>                 URL: https://issues.apache.org/jira/browse/NUTCH-719
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.0.0
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>             Fix For: 1.1
>
>
> I had a look at the logs generated by Fetcher2 and found cases where there were no active fetchQueues but fetchQueues.totalSize was != 0
> fetcher.Fetcher2 - -activeThreads=200, spinWaiting=200, fetchQueues.totalSize=1, fetchQueues=0
> since the code relies on fetchQueues.totalSize to determine whether the work is finished or not the task is blocked until the abortion mechanism kicks in
> 2009-03-12 09:27:38,977 WARN  fetcher.Fetcher2 - Aborting with 200 hung threads.
> could that be a synchronisation issue? any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-719) fetchQueues.totalSize incorrect in Fetcher2

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837147#action_12837147 ] 

Julien Nioche commented on NUTCH-719:
-------------------------------------

the other addFetchItem method of FetchItemQueues uses the synchronized one internally so there is no need for synch there + the one in FetchItemQueue is not necessary as it uses a synchronized collection internally. 

> fetchQueues.totalSize incorrect in Fetcher2
> -------------------------------------------
>
>                 Key: NUTCH-719
>                 URL: https://issues.apache.org/jira/browse/NUTCH-719
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.0.0
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>             Fix For: 1.1
>
>
> I had a look at the logs generated by Fetcher2 and found cases where there were no active fetchQueues but fetchQueues.totalSize was != 0
> fetcher.Fetcher2 - -activeThreads=200, spinWaiting=200, fetchQueues.totalSize=1, fetchQueues=0
> since the code relies on fetchQueues.totalSize to determine whether the work is finished or not the task is blocked until the abortion mechanism kicks in
> 2009-03-12 09:27:38,977 WARN  fetcher.Fetcher2 - Aborting with 200 hung threads.
> could that be a synchronisation issue? any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.