You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Raghavendra TK <ra...@gmail.com> on 2008/04/06 08:41:49 UTC

GSOC 2008: Hadoop-talend integration project

Hi,

I came across this -*  * hadoop-talend-integration project on Google SOC
list of projects haldled by ASF. The mentor is listed as Ian Holsman. I
wanted some more details of the project but could not get his e-mail id. I
have just subscribed to this mailing list and am not aware of discussions
regarding this project in this group. Does anyone have any idea of this
project. I have a background in ETL process and am interested in this
project. Can anyone here give me the point of contact/links for this
project.

The description of the project says - Integrating it with hadoop will allow
it to process larger files. Is there anything done in this respect already
or is it a completely new task?

Thanks,
Raghavendra

Re: GSOC 2008: Hadoop-talend integration project

Posted by Raghavendra TK <ra...@gmail.com>.
I have uploaded the proposal. You can download it from here
<http://www.cc.gatech.edu/%7Erags121/GSOC2008_Talend_Hadoop_proposal.pdf>.

Basically, I have proposed map-reduce implementation for the Talend jobs so
that they can be run on Hadoop. Comments are welcome.

Thanks,
Raghavendra

On Sun, Apr 6, 2008 at 4:56 PM, Ian Holsman <li...@holsman.net> wrote:

> Raghavendra TK wrote:
>
> > Hi,
> >
> > I came across this -*  * hadoop-talend-integration project on Google SOC
> > list of projects haldled by ASF. The mentor is listed as Ian Holsman. I
> > wanted some more details of the project but could not get his e-mail id.
> > I
> > have just subscribed to this mailing list and am not aware of
> > discussions
> > regarding this project in this group. Does anyone have any idea of this
> > project. I have a background in ETL process and am interested in this
> > project. Can anyone here give me the point of contact/links for this
> > project.
> >
> >
> >
>
> That would be me.
> I would prefer if you keep the questions on list, so that others can see
> the answers.
>
> as for what the project is about...
> talend has a series of transformations that can be run against a input
> stream. this SoC project is to allow those transforms to be run on a hadoop
> farm instead of a single machine.
>
> I'm guessing this would require creating a new set of transforms to be
> written.
>
>  The description of the project says - Integrating it with hadoop will
> > allow
> > it to process larger files. Is there anything done in this respect
> > already
> > or is it a completely new task?
> >
> >
>
> It is a completely new task.
>
> part of the task is to scope the work so that you can complete the project
> in the time required. I would be happy with 1-2 transforms working to show
> that it can be done if that is all that can be done.
>
> Regards
> Ian
>
> > Thanks,
> > Raghavendra
> >
> >
> >
>
>

Re: GSOC 2008: Hadoop-talend integration project

Posted by Ian Holsman <li...@holsman.net>.
Raghavendra TK wrote:
> Hi,
>
> I came across this -*  * hadoop-talend-integration project on Google SOC
> list of projects haldled by ASF. The mentor is listed as Ian Holsman. I
> wanted some more details of the project but could not get his e-mail id. I
> have just subscribed to this mailing list and am not aware of discussions
> regarding this project in this group. Does anyone have any idea of this
> project. I have a background in ETL process and am interested in this
> project. Can anyone here give me the point of contact/links for this
> project.
>
>   

That would be me.
I would prefer if you keep the questions on list, so that others can see 
the answers.

as for what the project is about...
talend has a series of transformations that can be run against a input 
stream. this SoC project is to allow those transforms to be run on a 
hadoop farm instead of a single machine.

I'm guessing this would require creating a new set of transforms to be 
written.

> The description of the project says - Integrating it with hadoop will allow
> it to process larger files. Is there anything done in this respect already
> or is it a completely new task?
>   

It is a completely new task.

part of the task is to scope the work so that you can complete the 
project in the time required. I would be happy with 1-2 transforms 
working to show that it can be done if that is all that can be done.

Regards
Ian
> Thanks,
> Raghavendra
>
>