You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Eugen Kochuev <eu...@lan23.net> on 2006/05/29 14:15:48 UTC
Re[2]: content-type crawling problem
Thanks for sharing the information, I'll try this, but if I got it
right parse-plugins.xml contains rules for the parser and still
undesirable documents will be fetched and stored in the segments.
Is it possible to stop fetcher from crawling these pages?
> Hello,
> i had also a similar problem, my little fix was to
> edit the parse-plugins.xml file. There is a the rule:
> <mimeType name="*">
> <plugin id="parse-text" />
> </mimeType>
> Just uncomment this wilcard match. You might also check
> the other rules for further unwanted content.
> I don't know if this is the best place for such a change,
> but it worked for me.
> with best regards,
> Heiko Dietze
--
Best regards,
Eugen mailto:eugen@lan23.net