You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2005/10/17 20:30:44 UTC

[jira] Commented: (NUTCH-114) getting number of urls and links from crawldb

    [ http://issues.apache.org/jira/browse/NUTCH-114?page=comments#action_12332267 ] 

Doug Cutting commented on NUTCH-114:
------------------------------------

You could use UTF8 as the output key type, map to keys like, "links" and "entries", and use TextOutputFormat.  Then the output would be a text file with the link and entry counts.

> getting number of urls and links from crawldb
> ---------------------------------------------
>
>          Key: NUTCH-114
>          URL: http://issues.apache.org/jira/browse/NUTCH-114
>      Project: Nutch
>         Type: New Feature
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Minor
>      Fix For: 0.8-dev
>  Attachments: CrawlDbStat.java, CrawlDbStatMapper.java
>
> We need a tool that provide basic statistics about the crawldb.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira