You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by David Spencer <da...@tropo.com> on 2005/05/10 21:45:46 UTC
Re: [Nutch-general] ASP Parser
Seth Taylor wrote:
> I've recently just installed and configured Nutch from source. From
> what I've read by default, Nutch will parse text and html based
> documents only. I have a site I'm trying to crawl which is all asp
> pages. I put the asp mime type in the mime-type.xml document. What
> else do I need to do in order for Nutch to crawl asp pages?
Probably you need to check out the URL filter (conf/crawl-urlfilter.txt)
and make sure the pages are not rejected. Note that there might be a
pattern that rejects argument to the URL so you might want to disable
that if the pages take args.
I would think that there is no ASP MIME type per-se -- surely the
average ASP page returns HTML documents?!
>
>
>
> Thanks,
>
> Seth
>
>
>
> staylor@hhgregg.com
>
>