You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Julien Nioche <li...@gmail.com> on 2010/09/28 13:56:27 UTC

Nutch use case : SimilarPages

Dear Nutch Users,

FYI I've blogged yesterday about an interesting use case of Nutch. We've
helped the guys at SimilarPages to use Nutch on EC2 for a super large crawl
(3 billion docs parsed), which they we've then used with a bit of MapReduce
magic to find similarities between web pages.

I will probably add a Use Case section on the Wiki and write a short
description of the project but in the meantime you can find more details on
http://digitalpebble.blogspot.com/2010/09/similarpages-is-out.html and of
course http://www.similarpages.com/ itself.

Best,

Julien Nioche

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Re: Nutch use case : SimilarPages

Posted by Markus Jelsma <ma...@buyways.nl>.
Interesting, looks very much like a service built in here in the Netherlands.

On Tuesday 28 September 2010 13:56:27 Julien Nioche wrote:
> Dear Nutch Users,
> 
> FYI I've blogged yesterday about an interesting use case of Nutch. We've
> helped the guys at SimilarPages to use Nutch on EC2 for a super large crawl
> (3 billion docs parsed), which they we've then used with a bit of MapReduce
> magic to find similarities between web pages.
> 
> I will probably add a Use Case section on the Wiki and write a short
> description of the project but in the meantime you can find more details on
> http://digitalpebble.blogspot.com/2010/09/similarpages-is-out.html and of
> course http://www.similarpages.com/ itself.
> 
> Best,
> 
> Julien Nioche
> 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350