You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "sam liu (JIRA)" <ji...@apache.org> on 2015/04/29 04:19:07 UTC

[jira] [Commented] (HADOOP-11886) Failed to run distcp against ftp server installed on Windows

    [ https://issues.apache.org/jira/browse/HADOOP-11886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518597#comment-14518597 ] 

sam liu commented on HADOOP-11886:
----------------------------------

[Scenario 1]
I installed a BI cluster using trunk build on HadoopNode1, and then could copy file from a ftp installed on Linux to hdfs using command:
hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt hdfs://HadoopNode1:9000/tmp/

[Scenario 2]
[Success on FileZilla ftp server on Windows7]:
[hdfs@hostname2.com ~]$ hadoop distcp ftp://ftp:ftp@hostname1.com:121/ftp_test.txt /tmp/
15/04/26 22:56:20 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[ftp://ftp:ftp@hostname1.com:121/ftp_test.txt], targetPath=/tmp, targetPathExists=true, preserveRawXattrs=false}
15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address: http://hostname2.com:8188/ws/v1/timeline/
15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at hostname2.com/9.32.249.181:8050
15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address: http://hostname2.com:8188/ws/v1/timeline/
15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at hostname2.com/9.32.249.181:8050
15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1429858372957_0002
15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application application_1429858372957_0002
15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job: http://hostname2.com:8088/proxy/application_1429858372957_0002/
15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running in uber mode : false
15/04/26 22:56:51 INFO mapreduce.Job:  map 0% reduce 0%

[Scenario 3]
On the same hadoop node, I can copy file from a remote ftp server installed on Windows7 using command:
wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.

But I failed to copy file from a ftp installed on Windows7 to hdfs using command:
[user1@HadoopNode1 ~]$ hadoop distcp ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
15/02/01 23:03:37 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp, targetPathExists=true}
15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at HadoopNode1/9.30.239.166:8032
15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed without indication.
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
        at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
        at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
        at org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
        at org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
        at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
        at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
        at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)


> Failed to run distcp against ftp server installed on Windows
> ------------------------------------------------------------
>
>                 Key: HADOOP-11886
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11886
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: tools/distcp
>            Reporter: sam liu
>            Assignee: sam liu
>            Priority: Blocker
>
> Could run distcp against ftp server installed on Linux, but could NOT run distcp against ftp server installed on Windows(such as IIS ftp service). However, distcp works well for FileZilla ftp server installed on Windows 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)