You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Mohan Radhakrishnan <ra...@gmail.com> on 2014/07/07 16:02:08 UTC
Managed File Transfer
Hi,
We used a commercial FT and scheduler tool in clustered mode.
This was a traditional active-active cluster that supported multiple
protocols like FTPS etc.
Now I am interested in evaluating a Distributed way of crawling FTP
sites and downloading files using Hadoop. I thought since we have to
process thousands of files Hadoop jobs can do it.
Are Hadoop jobs used for this type of file transfers ?
Moreover there is a requirement for a scheduler also. What is the
recommendation of the forum ?
Thanks,
Mohan
Re: Managed File Transfer
Posted by Mohan Radhakrishnan <ra...@gmail.com>.
I am a beginner. But this seems to be similar to what I intend. The data
source will be external FTP or S3 storage.
"Spark Streaming can read data from HDFS
<http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html>
,Flume <http://flume.apache.org/>, Kafka <http://kafka.apache.org/>, Twitter
<https://dev.twitter.com/> and ZeroMQ <http://zeromq.org/>. You can also
define your own custom data sources."
Thanks,
Mohan
On Wed, Jul 9, 2014 at 2:09 PM, Stanley Shi <ss...@gopivotal.com> wrote:
> There's a DistCP utility for this kind of purpose;
> Also there's "Spring XD" there, but I am not sure if you want to use it.
>
> Regards,
> *Stanley Shi,*
>
>
>
> On Mon, Jul 7, 2014 at 10:02 PM, Mohan Radhakrishnan <
> radhakrishnan.mohan@gmail.com> wrote:
>
>> Hi,
>> We used a commercial FT and scheduler tool in clustered mode.
>> This was a traditional active-active cluster that supported multiple
>> protocols like FTPS etc.
>>
>> Now I am interested in evaluating a Distributed way of crawling FTP
>> sites and downloading files using Hadoop. I thought since we have to
>> process thousands of files Hadoop jobs can do it.
>>
>> Are Hadoop jobs used for this type of file transfers ?
>>
>> Moreover there is a requirement for a scheduler also. What is the
>> recommendation of the forum ?
>>
>>
>> Thanks,
>> Mohan
>>
>
>
Re: Managed File Transfer
Posted by Mohan Radhakrishnan <ra...@gmail.com>.
I am a beginner. But this seems to be similar to what I intend. The data
source will be external FTP or S3 storage.
"Spark Streaming can read data from HDFS
<http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html>
,Flume <http://flume.apache.org/>, Kafka <http://kafka.apache.org/>, Twitter
<https://dev.twitter.com/> and ZeroMQ <http://zeromq.org/>. You can also
define your own custom data sources."
Thanks,
Mohan
On Wed, Jul 9, 2014 at 2:09 PM, Stanley Shi <ss...@gopivotal.com> wrote:
> There's a DistCP utility for this kind of purpose;
> Also there's "Spring XD" there, but I am not sure if you want to use it.
>
> Regards,
> *Stanley Shi,*
>
>
>
> On Mon, Jul 7, 2014 at 10:02 PM, Mohan Radhakrishnan <
> radhakrishnan.mohan@gmail.com> wrote:
>
>> Hi,
>> We used a commercial FT and scheduler tool in clustered mode.
>> This was a traditional active-active cluster that supported multiple
>> protocols like FTPS etc.
>>
>> Now I am interested in evaluating a Distributed way of crawling FTP
>> sites and downloading files using Hadoop. I thought since we have to
>> process thousands of files Hadoop jobs can do it.
>>
>> Are Hadoop jobs used for this type of file transfers ?
>>
>> Moreover there is a requirement for a scheduler also. What is the
>> recommendation of the forum ?
>>
>>
>> Thanks,
>> Mohan
>>
>
>
Re: Managed File Transfer
Posted by Mohan Radhakrishnan <ra...@gmail.com>.
I am a beginner. But this seems to be similar to what I intend. The data
source will be external FTP or S3 storage.
"Spark Streaming can read data from HDFS
<http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html>
,Flume <http://flume.apache.org/>, Kafka <http://kafka.apache.org/>, Twitter
<https://dev.twitter.com/> and ZeroMQ <http://zeromq.org/>. You can also
define your own custom data sources."
Thanks,
Mohan
On Wed, Jul 9, 2014 at 2:09 PM, Stanley Shi <ss...@gopivotal.com> wrote:
> There's a DistCP utility for this kind of purpose;
> Also there's "Spring XD" there, but I am not sure if you want to use it.
>
> Regards,
> *Stanley Shi,*
>
>
>
> On Mon, Jul 7, 2014 at 10:02 PM, Mohan Radhakrishnan <
> radhakrishnan.mohan@gmail.com> wrote:
>
>> Hi,
>> We used a commercial FT and scheduler tool in clustered mode.
>> This was a traditional active-active cluster that supported multiple
>> protocols like FTPS etc.
>>
>> Now I am interested in evaluating a Distributed way of crawling FTP
>> sites and downloading files using Hadoop. I thought since we have to
>> process thousands of files Hadoop jobs can do it.
>>
>> Are Hadoop jobs used for this type of file transfers ?
>>
>> Moreover there is a requirement for a scheduler also. What is the
>> recommendation of the forum ?
>>
>>
>> Thanks,
>> Mohan
>>
>
>
Re: Managed File Transfer
Posted by Mohan Radhakrishnan <ra...@gmail.com>.
I am a beginner. But this seems to be similar to what I intend. The data
source will be external FTP or S3 storage.
"Spark Streaming can read data from HDFS
<http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html>
,Flume <http://flume.apache.org/>, Kafka <http://kafka.apache.org/>, Twitter
<https://dev.twitter.com/> and ZeroMQ <http://zeromq.org/>. You can also
define your own custom data sources."
Thanks,
Mohan
On Wed, Jul 9, 2014 at 2:09 PM, Stanley Shi <ss...@gopivotal.com> wrote:
> There's a DistCP utility for this kind of purpose;
> Also there's "Spring XD" there, but I am not sure if you want to use it.
>
> Regards,
> *Stanley Shi,*
>
>
>
> On Mon, Jul 7, 2014 at 10:02 PM, Mohan Radhakrishnan <
> radhakrishnan.mohan@gmail.com> wrote:
>
>> Hi,
>> We used a commercial FT and scheduler tool in clustered mode.
>> This was a traditional active-active cluster that supported multiple
>> protocols like FTPS etc.
>>
>> Now I am interested in evaluating a Distributed way of crawling FTP
>> sites and downloading files using Hadoop. I thought since we have to
>> process thousands of files Hadoop jobs can do it.
>>
>> Are Hadoop jobs used for this type of file transfers ?
>>
>> Moreover there is a requirement for a scheduler also. What is the
>> recommendation of the forum ?
>>
>>
>> Thanks,
>> Mohan
>>
>
>
Re: Managed File Transfer
Posted by Stanley Shi <ss...@gopivotal.com>.
There's a DistCP utility for this kind of purpose;
Also there's "Spring XD" there, but I am not sure if you want to use it.
Regards,
*Stanley Shi,*
On Mon, Jul 7, 2014 at 10:02 PM, Mohan Radhakrishnan <
radhakrishnan.mohan@gmail.com> wrote:
> Hi,
> We used a commercial FT and scheduler tool in clustered mode.
> This was a traditional active-active cluster that supported multiple
> protocols like FTPS etc.
>
> Now I am interested in evaluating a Distributed way of crawling FTP
> sites and downloading files using Hadoop. I thought since we have to
> process thousands of files Hadoop jobs can do it.
>
> Are Hadoop jobs used for this type of file transfers ?
>
> Moreover there is a requirement for a scheduler also. What is the
> recommendation of the forum ?
>
>
> Thanks,
> Mohan
>
Re: Managed File Transfer
Posted by Stanley Shi <ss...@gopivotal.com>.
There's a DistCP utility for this kind of purpose;
Also there's "Spring XD" there, but I am not sure if you want to use it.
Regards,
*Stanley Shi,*
On Mon, Jul 7, 2014 at 10:02 PM, Mohan Radhakrishnan <
radhakrishnan.mohan@gmail.com> wrote:
> Hi,
> We used a commercial FT and scheduler tool in clustered mode.
> This was a traditional active-active cluster that supported multiple
> protocols like FTPS etc.
>
> Now I am interested in evaluating a Distributed way of crawling FTP
> sites and downloading files using Hadoop. I thought since we have to
> process thousands of files Hadoop jobs can do it.
>
> Are Hadoop jobs used for this type of file transfers ?
>
> Moreover there is a requirement for a scheduler also. What is the
> recommendation of the forum ?
>
>
> Thanks,
> Mohan
>
Re: Managed File Transfer
Posted by Stanley Shi <ss...@gopivotal.com>.
There's a DistCP utility for this kind of purpose;
Also there's "Spring XD" there, but I am not sure if you want to use it.
Regards,
*Stanley Shi,*
On Mon, Jul 7, 2014 at 10:02 PM, Mohan Radhakrishnan <
radhakrishnan.mohan@gmail.com> wrote:
> Hi,
> We used a commercial FT and scheduler tool in clustered mode.
> This was a traditional active-active cluster that supported multiple
> protocols like FTPS etc.
>
> Now I am interested in evaluating a Distributed way of crawling FTP
> sites and downloading files using Hadoop. I thought since we have to
> process thousands of files Hadoop jobs can do it.
>
> Are Hadoop jobs used for this type of file transfers ?
>
> Moreover there is a requirement for a scheduler also. What is the
> recommendation of the forum ?
>
>
> Thanks,
> Mohan
>
Re: Managed File Transfer
Posted by Stanley Shi <ss...@gopivotal.com>.
There's a DistCP utility for this kind of purpose;
Also there's "Spring XD" there, but I am not sure if you want to use it.
Regards,
*Stanley Shi,*
On Mon, Jul 7, 2014 at 10:02 PM, Mohan Radhakrishnan <
radhakrishnan.mohan@gmail.com> wrote:
> Hi,
> We used a commercial FT and scheduler tool in clustered mode.
> This was a traditional active-active cluster that supported multiple
> protocols like FTPS etc.
>
> Now I am interested in evaluating a Distributed way of crawling FTP
> sites and downloading files using Hadoop. I thought since we have to
> process thousands of files Hadoop jobs can do it.
>
> Are Hadoop jobs used for this type of file transfers ?
>
> Moreover there is a requirement for a scheduler also. What is the
> recommendation of the forum ?
>
>
> Thanks,
> Mohan
>