You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by winz <cw...@yahoo.com> on 2009/10/10 10:12:45 UTC

Re: how can I index only a portion of html content?


Jayant Kumar Gandhi wrote:
> 
> It is possible in many ways. One of the ways to do it without using
> the HTML pasrser plugin is to do cloaking for your bot.
> 
> 

Hi,
Could I please know all possible methods for achieving this??
This seems to be a common problem but I failed to find decent answers on
this forum.
I'm using a content management system named Infoglue to create my website.
Most of the pages in my site have an identical navigation bar, header and
footer.
The content in these sections show up in the search result.

In a related question, what does de-duplication in nutch mean and how does
it work??
Is it possible to configure nutch to remove duplicate contents like
navigation bar during its de-duplication process??

Regards,
Winz



-- 
View this message in context: http://www.nabble.com/how-can-I-index-only-a-portion-of-html-content--tp5149557p25832007.html
Sent from the Nutch - User mailing list archive at Nabble.com.