You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Isabel Drost <is...@apache.org> on 2009/08/11 21:53:38 UTC
Re: Finding the similarity of documents using Mahout for deduplication
On Monday 20 July 2009 10:41:17 Shashikant Kore wrote:
> You may read about Google's approach for near-duplicates.
>
> http://www2007.org/papers/paper215.pdf
A more in-depth analysis has been published at this year's WWW:
http://portal.acm.org/citation.cfm?id=1526719
Isabel
--
QOTD: Specifications subject to change without notice.
|\ _,,,---,,_ Web: <http://www.isabel-drost.de>
/,`.-'`' -. ;-;;,_
|,4- ) )-,_..;\ ( `'-'
'---''(_/--' `-'\_) (fL) IM: <xm...@spaceboyz.net>