You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Jon.P" <jo...@gmail.com> on 2015/12/14 09:40:11 UTC

Deploy a Nutch crawler or use Webhose.io?

Hi all,

I need your advice!

I need to harvest blog posts and news articles and extract their date, the
author, the text, the title and the comments if possible. The way I see it
I have two choices, deploy a Nutch crawler or as a friend suggested, use
Webhose.io <http://webhose.io/>.

The Webhose.io site has it's own Build or Buy
<https://webhose.io/white-papers/build-or-buy> comparison, but I wanted to
hear a Nutch user take on it.

Why did you go with Nutch and not with a service like Webhose.io? Where is
the catch?

Thank you,

Jon