You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by ZORAIDA HIDALGO SANCHEZ <zo...@tid.es> on 2013/11/27 12:59:56 UTC

Oozie and R

Hi all,

has anybody experimented with R and Oozie? We have a customized ETL running on HADOOP that is orchestrated by Oozie. Once the data is loaded into HDFS, we download it and we apply some heuristics locally. We want to continue using R(we do not contemplate using either Mahout or RMR for now) but we need to integrate the training step into our workflow(once the main tunning has been performed, just for new data).

We would appreciate if someone can share with us her experience.

Regards,

Zoraida.-

________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo.
This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at:
http://www.tid.es/ES/PAGINAS/disclaimer.aspx

Re: Oozie and R

Posted by Serega Sheypak <se...@gmail.com>.
A don't see any.
We do use custon rmr build managed with puppet.


2013/11/27 ZORAIDA HIDALGO SANCHEZ <zo...@tid.es>

> Thanks Serega,
>
> does the second option any other advantage rather than avoiding to install
> R in each node?
>
> El 27/11/13 13:12, "Serega Sheypak" <se...@gmail.com> escribió:
>
> >You can try wrap these steps into oozie java action.
> >java action is executed as map-only job on random node using only one
> >mapper.
> >
> >1. Dummy dirty
> >1.1. download from HDFS locally (use java api or sh script)
> >1.2. run your R app feeding it with downloaded data.
> >Problem: you have to install R on each TT node. You can't know for sure
> >which one would be used for mapper with java action.
> >
> >2. More complicated
> >Split your workflow on two coordinators
> >2.1. The first coordiantor prepares data
> >2.2. runs java action. This action uses ssh to connect to single known
> >node
> >and runs ssh script/pushes event/notifies by url. This script/app should
> >2.2.1. copy data locally
> >2.2.2. run R application
> >
> >The second coordiantor waits for a special flag
> >The R app from #2.2.2. should produce flag at the end (if it runs with
> >SUCCESS).
> >If the second coordiantors gets the flag and the defined folder, it should
> >continue to work.
> >
> >hope, it helps. Anyway all these solutions are not "hadoop patterns" that
> >is why they are ugly.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >2013/11/27 ZORAIDA HIDALGO SANCHEZ <zo...@tid.es>
> >
> >> Hi all,
> >>
> >> has anybody experimented with R and Oozie? We have a customized ETL
> >> running on HADOOP that is orchestrated by Oozie. Once the data is loaded
> >> into HDFS, we download it and we apply some heuristics locally. We want
> >>to
> >> continue using R(we do not contemplate using either Mahout or RMR for
> >>now)
> >> but we need to integrate the training step into our workflow(once the
> >>main
> >> tunning has been performed, just for new data).
> >>
> >> We would appreciate if someone can share with us her experience.
> >>
> >> Regards,
> >>
> >> Zoraida.-
> >>
> >> ________________________________
> >>
> >> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
> >> nuestra política de envío y recepción de correo electrónico en el enlace
> >> situado más abajo.
> >> This message is intended exclusively for its addressee. We only send and
> >> receive email on the basis of the terms set out at:
> >> http://www.tid.es/ES/PAGINAS/disclaimer.aspx
> >>
>
>
> ________________________________
>
> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
> nuestra política de envío y recepción de correo electrónico en el enlace
> situado más abajo.
> This message is intended exclusively for its addressee. We only send and
> receive email on the basis of the terms set out at:
> http://www.tid.es/ES/PAGINAS/disclaimer.aspx
>

Re: Oozie and R

Posted by ZORAIDA HIDALGO SANCHEZ <zo...@tid.es>.
Thanks Serega,

does the second option any other advantage rather than avoiding to install
R in each node?

El 27/11/13 13:12, "Serega Sheypak" <se...@gmail.com> escribió:

>You can try wrap these steps into oozie java action.
>java action is executed as map-only job on random node using only one
>mapper.
>
>1. Dummy dirty
>1.1. download from HDFS locally (use java api or sh script)
>1.2. run your R app feeding it with downloaded data.
>Problem: you have to install R on each TT node. You can't know for sure
>which one would be used for mapper with java action.
>
>2. More complicated
>Split your workflow on two coordinators
>2.1. The first coordiantor prepares data
>2.2. runs java action. This action uses ssh to connect to single known
>node
>and runs ssh script/pushes event/notifies by url. This script/app should
>2.2.1. copy data locally
>2.2.2. run R application
>
>The second coordiantor waits for a special flag
>The R app from #2.2.2. should produce flag at the end (if it runs with
>SUCCESS).
>If the second coordiantors gets the flag and the defined folder, it should
>continue to work.
>
>hope, it helps. Anyway all these solutions are not "hadoop patterns" that
>is why they are ugly.
>
>
>
>
>
>
>
>
>
>
>
>2013/11/27 ZORAIDA HIDALGO SANCHEZ <zo...@tid.es>
>
>> Hi all,
>>
>> has anybody experimented with R and Oozie? We have a customized ETL
>> running on HADOOP that is orchestrated by Oozie. Once the data is loaded
>> into HDFS, we download it and we apply some heuristics locally. We want
>>to
>> continue using R(we do not contemplate using either Mahout or RMR for
>>now)
>> but we need to integrate the training step into our workflow(once the
>>main
>> tunning has been performed, just for new data).
>>
>> We would appreciate if someone can share with us her experience.
>>
>> Regards,
>>
>> Zoraida.-
>>
>> ________________________________
>>
>> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
>> nuestra política de envío y recepción de correo electrónico en el enlace
>> situado más abajo.
>> This message is intended exclusively for its addressee. We only send and
>> receive email on the basis of the terms set out at:
>> http://www.tid.es/ES/PAGINAS/disclaimer.aspx
>>


________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo.
This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at:
http://www.tid.es/ES/PAGINAS/disclaimer.aspx

Re: Oozie and R

Posted by Serega Sheypak <se...@gmail.com>.
You can try wrap these steps into oozie java action.
java action is executed as map-only job on random node using only one
mapper.

1. Dummy dirty
1.1. download from HDFS locally (use java api or sh script)
1.2. run your R app feeding it with downloaded data.
Problem: you have to install R on each TT node. You can't know for sure
which one would be used for mapper with java action.

2. More complicated
Split your workflow on two coordinators
2.1. The first coordiantor prepares data
2.2. runs java action. This action uses ssh to connect to single known node
and runs ssh script/pushes event/notifies by url. This script/app should
2.2.1. copy data locally
2.2.2. run R application

The second coordiantor waits for a special flag
The R app from #2.2.2. should produce flag at the end (if it runs with
SUCCESS).
If the second coordiantors gets the flag and the defined folder, it should
continue to work.

hope, it helps. Anyway all these solutions are not "hadoop patterns" that
is why they are ugly.











2013/11/27 ZORAIDA HIDALGO SANCHEZ <zo...@tid.es>

> Hi all,
>
> has anybody experimented with R and Oozie? We have a customized ETL
> running on HADOOP that is orchestrated by Oozie. Once the data is loaded
> into HDFS, we download it and we apply some heuristics locally. We want to
> continue using R(we do not contemplate using either Mahout or RMR for now)
> but we need to integrate the training step into our workflow(once the main
> tunning has been performed, just for new data).
>
> We would appreciate if someone can share with us her experience.
>
> Regards,
>
> Zoraida.-
>
> ________________________________
>
> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
> nuestra política de envío y recepción de correo electrónico en el enlace
> situado más abajo.
> This message is intended exclusively for its addressee. We only send and
> receive email on the basis of the terms set out at:
> http://www.tid.es/ES/PAGINAS/disclaimer.aspx
>