You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Eyeris Rodriguez Rueda <er...@uci.cu> on 2016/10/26 20:01:05 UTC

about canonical pages to avoid duplicates pages

Hi all.
Im using nutch 1.12 and solr 4.10.3. in local mode.
I have detected a lot of duplicates pages on crawlDB. Maybe using canonical atribute i can reduce duplicate pages on crawldb.
I have read a old post(see below),that is an intersting topic.
https://issues.apache.org/jira/browse/NUTCH-710 

Is this feature supported by nutch or not ?.