You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Ted Yu <yu...@gmail.com> on 2009/12/11 23:23:26 UTC

stripping irrelevant contents

Hi,
We want to strip out irrelevant contents from the web pages we crawl.
Examples of irrelevant contents are display ads that surround the main body
of article on a web page.

Please share your experience.

Thanks