You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jan Riewe <ja...@comspace.de> on 2012/03/26 18:07:06 UTC

Pages that does not dedup

Hey there,

currently i try to debug the dedup results from nutch. There is a page
with is exactly the same (compared the HTML with a diff tool) as on a
differed Domain but dedup does not delete this entry. 

Is this caused by the differed Domain? If so, is there a possibility to
configure that?

Thanks in advice
Jan
--