You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2009/02/06 14:45:59 UTC
[jira] Closed: (NUTCH-357) crawling simulation
[ https://issues.apache.org/jira/browse/NUTCH-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrzej Bialecki closed NUTCH-357.
-----------------------------------
Resolution: Won't Fix
Assignee: Andrzej Bialecki
> crawling simulation
> -------------------
>
> Key: NUTCH-357
> URL: https://issues.apache.org/jira/browse/NUTCH-357
> Project: Nutch
> Issue Type: Improvement
> Affects Versions: 0.8.1, 0.9.0
> Reporter: Stefan Groschupf
> Assignee: Andrzej Bialecki
> Fix For: 1.0.0
>
> Attachments: protocol-simulation-pluginV1.patch
>
>
> We recently discovered some serious issue related to crawling and scoring. Reproducing these problems is a kind of difficult, since first of all it is not polite to re-crawl a set of pages again and again, secondly it is difficult to catch the page that cause a problem.
> Therefore it would be very useful to have a testbed to simulate crawls where we can control the response of "web servers".
> For the very beginning simulate very basic situation like a page points to it self, link chains or internal links would already be very usefully.
> However later on simulate crawls against existing data collections like TREC or a webgraph would be much more interesting, for instance to caculate the quality of the nutch OPIC implementation against page rank scores of the webgraph or evaluaing crawling strategies.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.