You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by Jeremy Lewi <je...@lewi.us> on 2013/12/08 20:30:51 UTC

Crunch and Contrail

Hi Crunch Users and Developers

Just wanted to let you know that we're starting to explore the use of
Crunch for Contrail.
(http://sourceforge.net/apps/mediawiki/contrail-bio/index.php?title=Contrail).
Contrail is a bioinformatics application written on top of Hadoop. Since
the algorithm includes several sequences of MR jobs Crunch would be very
useful both because of its more convenient programming model as well as the
potential for improved execution performance through pipeline optimization.

I hit what appeared to be some compatibility issues with version 0.8.0 and
Hadoop 1.2.1 but building Crunch from HEAD seemed to fix this.

Our first example of Crunch was a simple word count like pipeline for
collecting graph statistics. This was a breeze to write especially compared
to writing the equivalent MR jobs.

Contrail uses Avro extensively so Crunch's Avro support is critical for us.

Thanks
Jeremy

Re: Crunch and Contrail

Posted by Josh Wills <jo...@gmail.com>.
Hey Jeremy,

On Sun, Dec 8, 2013 at 11:44 AM, Brock Noland <br...@cloudera.com> wrote:

> Awesome!
>
>
> On Sun, Dec 8, 2013 at 1:30 PM, Jeremy Lewi <je...@lewi.us> wrote:
>
>> Hi Crunch Users and Developers
>>
>> Just wanted to let you know that we're starting to explore the use of
>> Crunch for Contrail.
>> (
>> http://sourceforge.net/apps/mediawiki/contrail-bio/index.php?title=Contrail).
>> Contrail is a bioinformatics application written on top of Hadoop. Since
>> the algorithm includes several sequences of MR jobs Crunch would be very
>> useful both because of its more convenient programming model as well as the
>> potential for improved execution performance through pipeline optimization.
>>
>> I hit what appeared to be some compatibility issues with version 0.8.0
>> and Hadoop 1.2.1 but building Crunch from HEAD seemed to fix this.
>>
>
Yeah, that was a mistake I made when doing the 0.8.0 binary release. Crunch
0.8.1 is identical to 0.8.0 and the binary versions are corrected to work
with hadoop1 and hadoop2.


>
>> Our first example of Crunch was a simple word count like pipeline for
>> collecting graph statistics. This was a breeze to write especially compared
>> to writing the equivalent MR jobs.
>>
>> Contrail uses Avro extensively so Crunch's Avro support is critical for
>> us.
>>
>
I'm so glad to hear it. Please let us know if there's anything else we can
help with.

Josh


>
>> Thanks
>> Jeremy
>>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
>

Re: Crunch and Contrail

Posted by Brock Noland <br...@cloudera.com>.
Awesome!


On Sun, Dec 8, 2013 at 1:30 PM, Jeremy Lewi <je...@lewi.us> wrote:

> Hi Crunch Users and Developers
>
> Just wanted to let you know that we're starting to explore the use of
> Crunch for Contrail.
> (
> http://sourceforge.net/apps/mediawiki/contrail-bio/index.php?title=Contrail).
> Contrail is a bioinformatics application written on top of Hadoop. Since
> the algorithm includes several sequences of MR jobs Crunch would be very
> useful both because of its more convenient programming model as well as the
> potential for improved execution performance through pipeline optimization.
>
> I hit what appeared to be some compatibility issues with version 0.8.0 and
> Hadoop 1.2.1 but building Crunch from HEAD seemed to fix this.
>
> Our first example of Crunch was a simple word count like pipeline for
> collecting graph statistics. This was a breeze to write especially compared
> to writing the equivalent MR jobs.
>
> Contrail uses Avro extensively so Crunch's Avro support is critical for
> us.
>
> Thanks
> Jeremy
>



-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org