You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Ali Naz <al...@yahoo.com.INVALID> on 2017/03/19 23:38:16 UTC

Crawling images with Nutch and extracting their URLs

Hi all,I am trying to develop a small system that would crawl the web, search for some specific images based on the file title and surrounding text etc, if an image is found, its URL is extracted and saved to a text file. 
I am not doing any image processing and thumbnail generation etc, Do I still need an ImageParser plugin for this or just handling the configuration files in conf/ directory will suffice?