You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/07/17 16:08:14 UTC

[jira] Updated: (NUTCH-321) Scoring API deficiency

     [ http://issues.apache.org/jira/browse/NUTCH-321?page=all ]

Andrzej Bialecki  updated NUTCH-321:
------------------------------------

    Attachment: patch.txt

Proposed improvements. If there are no objections I'll commit them shortly.

NOTE: this changes the API, but since v. 0.8 is still unreleased I feel it's the right time to do it.

> Scoring API deficiency
> ----------------------
>
>                 Key: NUTCH-321
>                 URL: http://issues.apache.org/jira/browse/NUTCH-321
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 0.8-dev
>            Reporter: Andrzej Bialecki 
>             Fix For: 0.8-dev
>
>         Attachments: patch.txt
>
>
> Currently the method ScoringFilter.updateDbScore() doesn't use the "old" value from existing CrawlDB. Instead it uses the value taken from the fetchlist from the current segment, which represents a snapshot of the "old" value taken at the moment of generating the fetchlist.
> The problem with this approach is that if/when we add a possibility to interleave generate/fetch/update cycles, the initial score values in CrawlDatum instance that comes from the current segment could be already outdated, if another updatedb was run in the meantime, which changed the DB score.
> For this reason we should always assume that the value from CrawlDB, if exists, represents the most recent version of CrawlDatum before the update, and use this instance as a base.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira