You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Ankur (JIRA)" <ji...@apache.org> on 2008/04/15 07:35:04 UTC

[jira] Commented: (HADOOP-3246) FTP client over HDFS

    [ https://issues.apache.org/jira/browse/HADOOP-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588912#action_12588912 ] 

Ankur commented on HADOOP-3246:
-------------------------------

Even though the two issues look same, they are different.

HADOOP-3199 is about an FTP server that provides FTP access to data in HDFS. Any FTP client would then be able to access HDFS data through FTP.

This issue is about an FTP client talks to remote FTP server(s), pull data from them and store directly into HDFS. 

At present we are faced with the issue of our data lying in different remote FTP server locations. Pulling a lot of data from different locations is a lot of manual work including fetching data over FTP, storing it locally and then putting it into HDFS. This is cumbersome especially if the data is too large to fit into local storage.

This utility essentially provides following benefits
1. The steps of  'pull data from FTP server', 'store locally', 'tranfer to HDFS' and 'delete local copy' are converted into 1 step - 'Pull data and store into HDFS' . 
2. No need to worry about lack of local storage as data goes directly into HDFS.
3. Can be used to run a batch of commands that include pulling data from different FTP servers.

All of this greatly simplifies administrative tasks.

+1 for marking this as 'Not Duplicate'

> FTP client over HDFS
> --------------------
>
>                 Key: HADOOP-3246
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3246
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: util
>            Reporter: Ankur
>            Priority: Minor
>
> An FTP client that stores content directly into HDFS allows data from FTP serves to be stored directly into HDFS instead of first copying the data locally and then uploading it into HDFS. The benefits are apparent from an administrative perspective as large datasets can be pulled from FTP servers with minimal human intervention.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.