You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Mark Kerzner <ma...@gmail.com> on 2009/03/24 05:23:24 UTC

Broder or other near-duplicate algorithms?

Hi,

does anybody know of an open-source implementation of the Broder
algorithm<http://www.std.org/%7Emsm/common/clustering.html>in Hadoop?
Monika Henzinger reports
having done <http://ltaa.epfl.ch/monika/mpapers/nearduplicates2006.pdf> so
in MapReduce, and I wonder if somebody has repeated her work in open source?

I am going to do this if there is no implementation yet, and then I will ask
what I can do with the code.

Cheers,
Mark

Re: Broder or other near-duplicate algorithms?

Posted by Mark Kerzner <ma...@gmail.com>.

Yi-Kai,
that's good to know - and I have read this article - but is your code
available?

Thank you,
Mark

On Tue, Mar 24, 2009 at 9:51 AM, Yi-Kai Tsai <yi...@yahoo-inc.com> wrote:

> hi Mark
>
> we had done something on top of hadoop/hbase (mapreduce for evaluation ,
> hbase for  online serving )
> by reference http://www2007.org/papers/paper215.pdf
>
>  Hi,
>>
>> does anybody know of an open-source implementation of the Broder
>> algorithm<http://www.std.org/%7Emsm/common/clustering.html>in Hadoop?
>> Monika Henzinger reports
>> having done <http://ltaa.epfl.ch/monika/mpapers/nearduplicates2006.pdf>
>> so
>> in MapReduce, and I wonder if somebody has repeated her work in open
>> source?
>>
>> I am going to do this if there is no implementation yet, and then I will
>> ask
>> what I can do with the code.
>>
>> Cheers,
>> Mark
>>
>>
>
>
> --
> Yi-Kai Tsai (cuma) <yi...@yahoo-inc.com>, Asia Search Engineering.
>
>

Re: Broder or other near-duplicate algorithms?

Posted by Yi-Kai Tsai <yi...@yahoo-inc.com>.

hi Mark

we had done something on top of hadoop/hbase (mapreduce for evaluation , 
hbase for  online serving )
by reference http://www2007.org/papers/paper215.pdf

> Hi,
>
> does anybody know of an open-source implementation of the Broder
> algorithm<http://www.std.org/%7Emsm/common/clustering.html>in Hadoop?
> Monika Henzinger reports
> having done <http://ltaa.epfl.ch/monika/mpapers/nearduplicates2006.pdf> so
> in MapReduce, and I wonder if somebody has repeated her work in open source?
>
> I am going to do this if there is no implementation yet, and then I will ask
> what I can do with the code.
>
> Cheers,
> Mark
>   


-- 
Yi-Kai Tsai (cuma) <yi...@yahoo-inc.com>, Asia Search Engineering.