You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Eugen Kochuev <eu...@lan23.net> on 2006/05/29 14:15:48 UTC

Re[2]: content-type crawling problem

Thanks for sharing the information, I'll try this, but if I got it
right parse-plugins.xml contains rules for the parser and still
undesirable documents will be fetched and stored in the segments.
Is it possible to stop fetcher from crawling these pages?

> Hello,

> i had also a similar problem, my little fix was to
> edit the parse-plugins.xml file. There is a the rule:

> <mimeType name="*">
>    <plugin id="parse-text" />
> </mimeType>

> Just uncomment this wilcard match. You might also check
> the other rules for further unwanted content.

> I don't know if this is the best place for such a change,
> but it worked for me.

> with best regards,

> Heiko Dietze



-- 
Best regards,
 Eugen                            mailto:eugen@lan23.net