You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Julien Nioche <li...@gmail.com> on 2010/09/28 13:56:27 UTC
Nutch use case : SimilarPages
Dear Nutch Users,
FYI I've blogged yesterday about an interesting use case of Nutch. We've
helped the guys at SimilarPages to use Nutch on EC2 for a super large crawl
(3 billion docs parsed), which they we've then used with a bit of MapReduce
magic to find similarities between web pages.
I will probably add a Use Case section on the Wiki and write a short
description of the project but in the meantime you can find more details on
http://digitalpebble.blogspot.com/2010/09/similarpages-is-out.html and of
course http://www.similarpages.com/ itself.
Best,
Julien Nioche
--
*
*Open Source Solutions for Text Engineering
http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
Re: Nutch use case : SimilarPages
Posted by Markus Jelsma <ma...@buyways.nl>.
Interesting, looks very much like a service built in here in the Netherlands.
On Tuesday 28 September 2010 13:56:27 Julien Nioche wrote:
> Dear Nutch Users,
>
> FYI I've blogged yesterday about an interesting use case of Nutch. We've
> helped the guys at SimilarPages to use Nutch on EC2 for a super large crawl
> (3 billion docs parsed), which they we've then used with a bit of MapReduce
> magic to find similarities between web pages.
>
> I will probably add a Use Case section on the Wiki and write a short
> description of the project but in the meantime you can find more details on
> http://digitalpebble.blogspot.com/2010/09/similarpages-is-out.html and of
> course http://www.similarpages.com/ itself.
>
> Best,
>
> Julien Nioche
>
Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350