You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Chris Schneider (JIRA)" <ji...@apache.org> on 2007/09/23 18:25:50 UTC
[jira] Commented: (NUTCH-558) Need tool to retrieve domain
statistics
[ https://issues.apache.org/jira/browse/NUTCH-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529749 ]
Chris Schneider commented on NUTCH-558:
---------------------------------------
I made a comment in the source about this, but thinking about it later, I do wonder whether this version truly works correctly when presented with a segment directory (in addition to a crawldb). I had to rewrite the InputFormat section of the tool to fit the latest Nutch/Hadoop source environment, and in the process, I removed the wrapper object necessary for my older source environment. I'd certainly welcome it if somebody out there with a more up to date installation and crawl data could give it a try.
> Need tool to retrieve domain statistics
> ---------------------------------------
>
> Key: NUTCH-558
> URL: https://issues.apache.org/jira/browse/NUTCH-558
> Project: Nutch
> Issue Type: New Feature
> Affects Versions: 0.9.0
> Reporter: Chris Schneider
> Assignee: Chris Schneider
> Attachments: DomainStats.patch
>
>
> Several developers have expressed interest in a tool to retrieve statistics from a crawl on a domain basis (e.g., how many pages were successfully fetched from www.apache.org vs. apache.org, where the latter total would include the former).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.