You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Ian Holsman <li...@holsman.net> on 2008/03/11 05:12:14 UTC
PROPOSAL: Summer of Code 2008 - Integrate Talend with Hadoop.
I'd like to volunteer a proposal for the upcoming summer of code project.
Talend is a open source (GPL) data integration tool used by companies to
transform data from one format to another.
For example I might get 2-3 XML input files that I need to feed into a
database, or SOLR server. It works really well until you start bumping
into memory limits or time concerns when you handle large files.
Enter hadoop.
I'd would like to propose a project to write the necessary bits to make
talend jobs run on a hadoop cluster, possibly using things like pig.
While I understand this code will probably end up as a part of talend's
code base, I think it would be a neat project to expand hadoop's
presence in this space.
I'm willing to act as a mentor for it. (I've been a mentor for HTTP, and
lucene projects in the past)
regards
Ian