You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Srinivasan Ramaswamy <ur...@gmail.com> on 2017/08/14 16:02:36 UTC

measure crawl rate of crawled website from nutch

Is there a way to measure (some sort of stats) how many requests did nutch
send to a website for one day or one hour ? I would like to measure the
crawl rate ?

Here are the options i tried so far (with the dump i created out of crawldb)

- use the "tstamp" field in the index and aggregate it and count by every
unique date/hour
- filter the crawldb by modified date ( to the date being analyzed) and
then aggregate again by date/hour ( to make sure we dont just count
db_fetched, but everything else).

Thanks
Srini