You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Joseph Naegele <jn...@grierforensics.com> on 2016/03/08 16:27:31 UTC

protocol-http or protocol-httpclient?

I'm using Nutch 1.11. The "plugin.includes" section of nutch-default.xml
still states that the protocol-httpclient plugin may present intermittent
problems. Is this still the case? What are the problems?

There doesn't appear to be any problem crawling HTTPS using the
protocol-http plugin. Why do I need to use protocol-httpclient for crawling
via HTTPS?

In short, I want to use the "correct" plugin because I am extending it to
perform a bit of extra work. "Correct" in this case means:
- The "recommended" of the two
- Whichever can crawl both HTTP and HTTPS connections
- Whichever performs better

Thanks,
Joe