You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Jaliya Ekanayake <jn...@gmail.com> on 2010/02/10 19:07:16 UTC

Twister: Iterative MapReduce

Hi All,

 

We would like to announce the first open source release of the Twister
framework for iterative MapReduce computations.

MapReduce programming model has simplified the implementations of many data
parallel applications. The simplicity of the programming model and the
quality of services provided by many implementations of MapReduce attract a
lot of enthusiasm among parallel computing communities. From the years of
experience in applying MapReduce programming model to various scientific
applications we identified a set of extensions to the programming model and
improvements to its architecture which will expand the applicability of
MapReduce to more classes of applications. 

Twister is a lightweight MapReduce runtime we have developed by
incorporating these enhancements. We have published several scientific
papers [1-5] explaining the key concepts and comparing it with other
MapReduce implementations such as Hadoop and DryadLINQ. Today we would like
to announce its first release.  

Key Features of Twister are:

Distinction on static and variable data

                Configurable long running (cacheable) map/reduce tasks

                Pub/sub messaging based communication/data transfers

                Combine phase to collect all reduce outputs

                Efficient support for Iterative MapReduce computations 

                Data access via local disks

                Lightweight (5600 lines of code)

                Tools to manage data 

We would like to share the design decisions and ideas we have incorporated
into Twister with you all and we will be very grateful if you could share
your thoughts about it with us.  For more details please visit
www.iterativemapreduce.org and let us know your thoughts and experience
using Twister.

 

SALSA <http://salsaweb.indiana.edu/salsa/> HPC Team.

 

Thank you,

Jaliya Ekanayake

Phone:  Work +1 812-855-2990, Cell +1 812-606-0561

Web: www.cs.indiana.edu/~jekanaya

 

[1]. Jaliya Ekanayake, (Advisor: Geoffrey Fox) Architecture
<http://grids.ucs.indiana.edu/ptliupages/publications/SC09-abstract-jaliya-e
kanayake.pdf>  and Performance of Runtime Environments for Data Intensive
Scalable Computing, Doctoral Showcase, SuperComputing2009.

[2]. Jaliya Ekanayake, Atilla Soner Balkir, Thilina Gunarathne, Geoffrey
Fox, Christophe Poulain, Nelson Araujo, Roger Barga, DryadLINQ
<http://grids.ucs.indiana.edu/ptliupages/publications/eScience09-camera-read
y-submission.pdf>  for Scientific Analyses, Fifth IEEE International
Conference on e-Science (eScience2009), Oxford, UK.

[3]. Jaliya Ekanayake, Xiaohong Qiu, Thilina Gunarathne, Scott Beason,
Geoffrey Fox High
<http://grids.ucs.indiana.edu/ptliupages/publications/cloud_handbook_final-w
ith-diagrams.pdf>  Performance Parallel Computing with Clouds and Cloud
Technologies Technical Report August 25 2009 to appear as Book Chapter.

[4]. Geoffrey Fox, Seung-Hee Bae, Jaliya Ekanayake, Xiaohong Qiu, and
Huapeng Yuan, Parallel
<http://grids.ucs.indiana.edu/ptliupages/publications/CetraroWriteupJune11-0
9.pdf>  Data Mining from Multicore to Cloudy Grids, High Performance
Computing and Grids workshop, 2008.  - An extended version of this paper
goes to a book chapter.

[5]. Jaliya Ekanayake, Shrideep Pallickara, Geoffrey Fox,  MapReduce
<http://grids.ucs.indiana.edu/ptliupages/publications/ekanayake-MapReduce.pd
f>  for Data Intensive Scientific Analyses, Fourth IEEE International
Conference on eScience, 2008, pp.277-284.

Re: Twister: Iterative MapReduce

Posted by Ted Dunning <te...@gmail.com>.

Applicable for tiny clusters only.  There is no fault tolerance and all data
is streamed from map to reduce.  There is also no distributed store (they
are depending on NFS or local data copies).

That is highly effective for algorithms like k-means on small clusters which
are I/O bound.  Small clusters make failure tolerance much less important,
of course.  More recent versions of Hadoop do a much better job of avoiding
a spill to disk and are likely to show much better performance.

The other win for Twister is the preservation of the map-task across
multiple iterations.  This is clearly a big win for some algorithms (most
notably k-means).

I would say that overall, this is nice but definitely not ready for prime
time.

On Wed, Feb 10, 2010 at 10:10 AM, Robin Anil <ro...@gmail.com> wrote:

> Well, Things seems to be heating up. We better start refactoring :)
>
>

Fwd: Twister: Iterative MapReduce

Posted by Robin Anil <ro...@gmail.com>.

Well, Things seems to be heating up. We better start refactoring :)

Robin

---------- Forwarded message ----------
From: Jaliya Ekanayake <jn...@gmail.com>
Date: Wed, Feb 10, 2010 at 11:37 PM
Subject: Twister: Iterative MapReduce
To: common-dev@hadoop.apache.org

Hi All,

We would like to announce the first open source release of the Twister
framework for iterative MapReduce computations.

MapReduce programming model has simplified the implementations of many data
parallel applications. The simplicity of the programming model and the
quality of services provided by many implementations of MapReduce attract a
lot of enthusiasm among parallel computing communities. From the years of
experience in applying MapReduce programming model to various scientific
applications we identified a set of extensions to the programming model and
improvements to its architecture which will expand the applicability of
MapReduce to more classes of applications.

Twister is a lightweight MapReduce runtime we have developed by
incorporating these enhancements. We have published several scientific
papers [1-5] explaining the key concepts and comparing it with other
MapReduce implementations such as Hadoop and DryadLINQ. Today we would like
to announce its first release.

Key Features of Twister are:

Distinction on static and variable data

               Configurable long running (cacheable) map/reduce tasks

               Pub/sub messaging based communication/data transfers

               Combine phase to collect all reduce outputs

               Efficient support for Iterative MapReduce computations

               Data access via local disks

               Lightweight (5600 lines of code)

               Tools to manage data

We would like to share the design decisions and ideas we have incorporated
into Twister with you all and we will be very grateful if you could share
your thoughts about it with us.  For more details please visit
www.iterativemapreduce.org and let us know your thoughts and experience
using Twister.

SALSA <http://salsaweb.indiana.edu/salsa/> HPC Team.

Thank you,

Jaliya Ekanayake

Phone:  Work +1 812-855-2990, Cell +1 812-606-0561

Web: www.cs.indiana.edu/~jekanaya

[1]. Jaliya Ekanayake, (Advisor: Geoffrey Fox) Architecture
<http://grids.ucs.indiana.edu/ptliupages/publications/SC09-abstract-jaliya-e
kanayake.pdf>  and Performance of Runtime Environments for Data Intensive
Scalable Computing, Doctoral Showcase, SuperComputing2009.

[2]. Jaliya Ekanayake, Atilla Soner Balkir, Thilina Gunarathne, Geoffrey
Fox, Christophe Poulain, Nelson Araujo, Roger Barga, DryadLINQ
<http://grids.ucs.indiana.edu/ptliupages/publications/eScience09-camera-read
y-submission.pdf>  for Scientific Analyses, Fifth IEEE International
Conference on e-Science (eScience2009), Oxford, UK.

[3]. Jaliya Ekanayake, Xiaohong Qiu, Thilina Gunarathne, Scott Beason,
Geoffrey Fox High
<http://grids.ucs.indiana.edu/ptliupages/publications/cloud_handbook_final-w
ith-diagrams.pdf>  Performance Parallel Computing with Clouds and Cloud
Technologies Technical Report August 25 2009 to appear as Book Chapter.

[4]. Geoffrey Fox, Seung-Hee Bae, Jaliya Ekanayake, Xiaohong Qiu, and
Huapeng Yuan, Parallel
<http://grids.ucs.indiana.edu/ptliupages/publications/CetraroWriteupJune11-0
9.pdf>  Data Mining from Multicore to Cloudy Grids, High Performance
Computing and Grids workshop, 2008.  - An extended version of this paper
goes to a book chapter.

[5]. Jaliya Ekanayake, Shrideep Pallickara, Geoffrey Fox,  MapReduce
<http://grids.ucs.indiana.edu/ptliupages/publications/ekanayake-MapReduce.pd
f>  for Data Intensive Scientific Analyses, Fourth IEEE International
Conference on eScience, 2008, pp.277-284.