You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by swaraj <sw...@minglebox.com> on 2011/11/10 07:53:40 UTC
how to remove meta description tag from content
Hi,
I have nutch and solr based crawling setup and done but have a use case in
mind to implement.
I wish to remove meta description tag content (e.g. <meta name="description"
content="some page content"/>)from being parsed as part of the content part
of the crawled page.
How do I achieve that?
I have already written a index plugin to get meta description as separate
field for solr to index.
Any pointers will be much appreciated.
Regards,
Swaraj Yadav