You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Waldmann, Lukas" <lu...@merck.com> on 2017/12/04 16:23:51 UTC

FTP/FTPS/SFTP filesystem module

Hi fellow developer,
I would like to ask you for a review and testing of a new module which I hope will replace current implementation of FTP filesystem.
Patch is available here: https://issues.apache.org/jira/browse/HADOOP-14444
Benefits of new module:
* Support for FTP, FTPS and SFTP protocols
* Support for HTTP/SOCKS proxies
* Support for passive FTP
* Support for explicit FTPS
* Support of connection pooling - new connection is not created for every single command but reused from the pool. For huge number of files it shows order of magnitude performance improvement over not pooled connections.
* Caching of directory trees. For ftp you always need to list whole directory whenever you ask information about particular file. Again, for huge number of files it shows order of magnitude performance improvement over not cached connections.
* Support of keep alive (NOOP) messages to avoid connection drops
* Support for Unix style or regexp wildcard glob - useful for listing particular files across whole directory tree
* Support for reestablishing broken ftp data transfers - can happen surprisingly often

Thank you for your cooperation

Lukas

Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (2000 Galloping Hill Road, Kenilworth,
New Jersey, USA 07033), and/or its affiliates Direct contact information
for affiliates is available at 
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from 
your system.