You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tuscany.apache.org by Chris Trezzo <ct...@gmail.com> on 2008/05/22 02:18:30 UTC

[GSoC] Map-Reduce Integration with Tuscany

Hi everyone,

I have been fleshing out a slightly more detailed plan for the  
upcoming weeks, and would like to share it with the community and get  
some feedback.

For the first iteration I am planning to focus on Java implemented Map- 
Reduce (MR) applications. These apps interface directly with Hadoop's  
MR framework[1], as opposed to Hadoop Streaming[2] or Pipes[3].

I think the first priority should be to get a basic MR data flow  
working, and the three necessary entities of a basic MR application  
seem to be the Mapper, the Reducer, and a job configuration. I am  
planning on getting the functionality for these three parts  
implemented first.

Going along with the original design in the proposal, I am planning to  
view the Mapper and the Reducer as implementation types, and the job  
configuration as part of a management layer in charge of the assembly  
and deployment of MR applications. Initially the management layer  
would be responsible for the configuration of MR jobs and the  
integration with Hadoop's MR framework, with the overall goal of  
eventually extending it into something more along the lines of what  
was described by Robert Donkin[4] and Jean-Sebastian (referred to as  
item 3)[5]. In this case, the layer could be used to manage the  
deployment of components over a Hadoop cluster itself.

For the Mapper and Reducer, in the next couple of weeks I would like  
to outline the definition of these types and hopefully start  
implementing them.

For the management layer, I could use some guidance on how to best fit  
it into the Tuscany architectural framework.

Thoughts/Suggestions on any part of my plans are always greatly  
appreciated.

Congrats to everyone for graduation!

Thanks,
Chris Trezzo

[1] http://hadoop.apache.org/core/docs/r0.15.3/api/org/apache/hadoop/mapred/package-summary.html
[2] http://hadoop.apache.org/core/docs/current/streaming.html
[3] http://hadoop.apache.org/core/docs/r0.15.3/api/org/apache/hadoop/mapred/pipes/package-summary.html
[4] http://www.mail-archive.com/tuscany-dev@ws.apache.org/msg29711.html
[5] http://www.mail-archive.com/tuscany-dev@ws.apache.org/msg29720.html