You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Kevin Burton <bu...@gmail.com> on 2011/12/27 07:32:49 UTC

Peregrine: A new map reduce framework for iterative/pipelined jobs.

I'm pleased to announce Peregrine 0.5.0 - a new map reduce framework
optimized
for iterative and pipelined map reduce jobs.

http://peregrine_mapreduce.bitbucket.org/

This originally started off with some internal work at Spinn3r to build a
fast
and efficient Pagerank implementation.  We realized that what we wanted was
a MR
runtime optimized for this type of work which differs radically from the
traditional Hadoop design.

Peregrine implements a partitioned distributed filesystem where key/value
pairs
are routed to defined partitions.  This enables work to be joined against
previous iterations or different units of work by the same key on the same
local
system.

Peregrine is optimized for ETL jobs where the primary data storage system
is an
external database such as Cassandra, Hbase, MySQL, etc.  Jobs are then run
as a
Extract, Transform and Load stages with intermediate data being stored in
the
Peregrine FS.

We enable features such as Map/Reduce/Merge as well as some additional
functionality like ExtractMap and ReduceLoad (in ETL parlance).

A key innovation here is a partitioning layout algorithm that can support
fast
many to many recovery similar to HDFS but still support partitioned
operation
with deterministic key placement.

We've also tried to optimize for single instance performance and use modern
IO
primitives as much as possible.  This includes NOT shying away from
operating
specific features such as mlock, fadvise, fallocate, etc.

There is still a bit more work I want to do before I am ready to benchmark
it
against Hadoop.  Instead of implementing a synthetic benchmark we wanted to
get
a production ready version first which would allow people to port existing
applications and see what the before / after performance numbers looked
like in
the real world.

For more information please see:

http://peregrine_mapreduce.bitbucket.org/

As well as our design documentation:

http://peregrine_mapreduce.bitbucket.org/design/



-- 
-- 

Founder/CEO Spinn3r.com <http://spinn3r.com/>

Location: *San Francisco, CA*
Skype: *burtonator*

Skype-in: *(415) 871-0687*