You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Isabel Drost <is...@apache.org> on 2009/08/11 21:53:38 UTC

Re: Finding the similarity of documents using Mahout for deduplication

On Monday 20 July 2009 10:41:17 Shashikant Kore wrote:
> You may read about Google's approach for near-duplicates.
>
> http://www2007.org/papers/paper215.pdf

A more in-depth analysis has been published at this year's WWW:

http://portal.acm.org/citation.cfm?id=1526719

Isabel

-- 
QOTD: Specifications subject to change without notice. 
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_  
 |,4-  ) )-,_..;\ (  `'-' 
'---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>