You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Eyeris Rodriguez Rueda <er...@uci.cu> on 2016/10/26 20:01:05 UTC
about canonical pages to avoid duplicates pages
Hi all.
Im using nutch 1.12 and solr 4.10.3. in local mode.
I have detected a lot of duplicates pages on crawlDB. Maybe using canonical atribute i can reduce duplicate pages on crawldb.
I have read a old post(see below),that is an intersting topic.
https://issues.apache.org/jira/browse/NUTCH-710
Is this feature supported by nutch or not ?.