You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "Agarwal, Nikhil" <Ni...@netapp.com> on 2013/03/26 10:19:19 UTC

How to tell my Hadoop cluster to read data from an external server

Hi,

I have a Hadoop cluster up and running. I want to submit an MR job to it but the input data is kept on an external server (outside the hadoop cluster). Can anyone please suggest how do I tell my hadoop cluster to load the input data from the external servers and then do a MR on it ?

Thanks & Regards,
Nikhil

Re: How to tell my Hadoop cluster to read data from an external server

Posted by Nitin Pawar <ni...@gmail.com>.
you are looking at a two step workflow here

first unit of your workflow will download the file from external server and
write it to DFS and return the file path
second unit of your workflow will read the input path and process the data
according to your business logic in MR

you can look at cascading for this simple approach. Its easy to build
simple workflow application using cascading.
other options being oozie or you may try crunch (its very new but easy to
use as well)



On Tue, Mar 26, 2013 at 2:49 PM, Agarwal, Nikhil
<Ni...@netapp.com>wrote:

>  Hi,****
>
> ** **
>
> I have a Hadoop cluster up and running. I want to submit an MR job to it
> but the input data is kept on an external server (outside the hadoop
> cluster). Can anyone please suggest how do I tell my hadoop cluster to load
> the input data from the external servers and then do a MR on it ?****
>
> ** **
>
> Thanks & Regards,****
>
> Nikhil****
>



-- 
Nitin Pawar

Re: How to tell my Hadoop cluster to read data from an external server

Posted by Nitin Pawar <ni...@gmail.com>.
you are looking at a two step workflow here

first unit of your workflow will download the file from external server and
write it to DFS and return the file path
second unit of your workflow will read the input path and process the data
according to your business logic in MR

you can look at cascading for this simple approach. Its easy to build
simple workflow application using cascading.
other options being oozie or you may try crunch (its very new but easy to
use as well)



On Tue, Mar 26, 2013 at 2:49 PM, Agarwal, Nikhil
<Ni...@netapp.com>wrote:

>  Hi,****
>
> ** **
>
> I have a Hadoop cluster up and running. I want to submit an MR job to it
> but the input data is kept on an external server (outside the hadoop
> cluster). Can anyone please suggest how do I tell my hadoop cluster to load
> the input data from the external servers and then do a MR on it ?****
>
> ** **
>
> Thanks & Regards,****
>
> Nikhil****
>



-- 
Nitin Pawar

Re: How to tell my Hadoop cluster to read data from an external server

Posted by Nitin Pawar <ni...@gmail.com>.
you are looking at a two step workflow here

first unit of your workflow will download the file from external server and
write it to DFS and return the file path
second unit of your workflow will read the input path and process the data
according to your business logic in MR

you can look at cascading for this simple approach. Its easy to build
simple workflow application using cascading.
other options being oozie or you may try crunch (its very new but easy to
use as well)



On Tue, Mar 26, 2013 at 2:49 PM, Agarwal, Nikhil
<Ni...@netapp.com>wrote:

>  Hi,****
>
> ** **
>
> I have a Hadoop cluster up and running. I want to submit an MR job to it
> but the input data is kept on an external server (outside the hadoop
> cluster). Can anyone please suggest how do I tell my hadoop cluster to load
> the input data from the external servers and then do a MR on it ?****
>
> ** **
>
> Thanks & Regards,****
>
> Nikhil****
>



-- 
Nitin Pawar

Re: How to tell my Hadoop cluster to read data from an external server

Posted by Nitin Pawar <ni...@gmail.com>.
you are looking at a two step workflow here

first unit of your workflow will download the file from external server and
write it to DFS and return the file path
second unit of your workflow will read the input path and process the data
according to your business logic in MR

you can look at cascading for this simple approach. Its easy to build
simple workflow application using cascading.
other options being oozie or you may try crunch (its very new but easy to
use as well)



On Tue, Mar 26, 2013 at 2:49 PM, Agarwal, Nikhil
<Ni...@netapp.com>wrote:

>  Hi,****
>
> ** **
>
> I have a Hadoop cluster up and running. I want to submit an MR job to it
> but the input data is kept on an external server (outside the hadoop
> cluster). Can anyone please suggest how do I tell my hadoop cluster to load
> the input data from the external servers and then do a MR on it ?****
>
> ** **
>
> Thanks & Regards,****
>
> Nikhil****
>



-- 
Nitin Pawar