You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Markus Jelsma <ma...@openindex.io> on 2017/08/30 09:07:50 UTC

Too many fetches at the same time

Hello,

Or fetcher, 8 threads, 2 fetcher.threads.per.queue, 2.5 fetcher.server.delay, 0.9 fetcher.server.min.delay (running with variable crawl-delay but not with this host) just fetched 61 records in about one second! I can't tell if it was the same thread (no thread ID is logged) but this is unbelievable at best (also because our parser is quite slow). Most fetchers don't show this behaviour and yesterday and just now were the only times ever so far i have observed this.

Any ideas to share?

Thanks,
Markus

2017-08-29 13:20:28,901 INFO [FetcherThread] org.apache.nutch.fetcher.FetcherThread: fetching https://www.example.org/los-programas-en-casa-del-profesor-con-esl-escuela-de-ingles-en-casa-del-profesor-inglaterra.htm (queue crawl delay=2500ms)
2017-08-29 13:20:28,902 INFO [FetcherThread] org.apache.nutch.fetcher.FetcherThread: fetching https://www.example.org/portugues-comercial-en-portugal-lisboa.htm (queue crawl delay=2500ms)
....about 50+ more here
2017-08-29 13:20:28,920 INFO [FetcherThread] org.apache.nutch.fetcher.FetcherThread: fetching https://www.example.org/navitas-english-escuela-de-ingles-sidney-australia.htm (queue crawl delay=2500ms)
2017-08-29 13:20:28,920 INFO [FetcherThread] org.apache.nutch.fetcher.FetcherThread: fetching https://www.example.org/kaplan-international-escuela-de-ingles-miami-estados-unidos.htm (queue crawl delay=2500ms)
2017-08-29 13:20:28,920 INFO [FetcherThread] org.apache.nutch.fetcher.FetcherThread: fetching https://www.example.org/ingles-comercial-en-canada-vancouver.htm (queue crawl delay=2500ms)