You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by swaraj <sw...@minglebox.com> on 2011/11/10 07:53:40 UTC

how to remove meta description tag from content

Hi,

 

I have nutch and solr based crawling setup and done but have a use case in
mind to implement.

I wish to remove meta description tag content (e.g. <meta name="description"
content="some page content"/>)from being parsed as part of the content part
of the crawled page.

How do I achieve that?

I have already written a index plugin to get meta description as separate
field for solr to index.

Any pointers will be much appreciated.

 

Regards,

Swaraj Yadav