You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Euan Clark <eu...@nzs.com> on 2009/08/05 04:22:43 UTC
Filtering by mime-type
Hi
Does anyone know of a way to filter by mime-type in the initial server
response?
We are only interested in html and we have a problem where URLs provide no
indication that they are files, e.g. PDF,RSS,XML etc
Ideally I'd want the fetcher to make the request to the server and abandon
fetching based on some sort of blacklist/whitelist of mime-types.
Cheers.