You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@whirr.apache.org by Andrei Savu <sa...@gmail.com> on 2011/12/20 10:47:43 UTC

Processing Common Crawl data with Whirr

Here is an interesting article about how to process Common Crawl data using
Amazon EMR:
http://www.commoncrawl.org/mapreduce-for-the-masses/

I think we should be able to do something similar with Whirr quite easily.

I will give it a try soon.

-- Andrei Savu

Re: Processing Common Crawl data with Whirr

Posted by Tom White <to...@cloudera.com>.
This article by Mat Kelcey about processing Common Crawl data looks
interesting too:

http://matpalm.com/blog/2011/12/10/common_crawl_visible_text/

Cheers,
Tom

On Tue, Dec 20, 2011 at 1:47 AM, Andrei Savu <sa...@gmail.com> wrote:
> Here is an interesting article about how to process Common Crawl data using
> Amazon EMR:
> http://www.commoncrawl.org/mapreduce-for-the-masses/
>
> I think we should be able to do something similar with Whirr quite easily.
>
> I will give it a try soon.
>
> -- Andrei Savu