You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "King Kong (JIRA)" <ji...@apache.org> on 2006/09/15 06:18:23 UTC

[jira] Commented: (NUTCH-353) pages that serverside forwards will be refetched every time

    [ http://issues.apache.org/jira/browse/NUTCH-353?page=comments#action_12434881 ] 
            
King Kong commented on NUTCH-353:
---------------------------------

this is a really serious problem. because the orginal url are fetched again and again :-(

I argee with stefan's solution . 

I think this problem should attract more people's attention.

> pages that serverside forwards will be refetched every time
> -----------------------------------------------------------
>
>                 Key: NUTCH-353
>                 URL: http://issues.apache.org/jira/browse/NUTCH-353
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 0.8.1, 0.9.0
>            Reporter: Stefan Groschupf
>            Priority: Blocker
>             Fix For: 0.8.1
>
>         Attachments: doNotRefecthForwarderPagesV1.patch
>
>
> Pages that do a serverside forward are not written with a status change back into the crawlDb. Also the nextFetchTime is not changed. 
> This causes a refetch of the same page again and again. The result is nutch is not polite and refetching the forwarding and target page in each segment iteration. Also it effects the scoring since the forward page contribute it's score to all outlinks.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira