You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Euan Clark <eu...@nzs.com> on 2009/08/05 04:22:43 UTC

Filtering by mime-type

Hi

Does anyone know of a way to filter by mime-type in the initial server
response?
We are only interested in html and we have a problem where URLs provide no
indication that they are files, e.g. PDF,RSS,XML etc

Ideally I'd want the fetcher to make the request to the server and abandon
fetching based on some sort of blacklist/whitelist of mime-types.

Cheers.