You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by sam liu <sa...@gmail.com> on 2015/05/04 04:08:06 UTC
Re: Failed to run distcp against ftp server installed on Windows.
Hi Experts,
Is there any comment on this issue?
Thanks!
2015-04-29 10:35 GMT+08:00 sam liu <sa...@gmail.com>:
> for IIS ftp server on Windows, seems the distcp tool always failed on the
> line 'client.setFileTransferMode(FTP.BLOCK_TRANSFER_MODE)' in
> hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ftp/FTPFileSystem.java#connect()
>
> Opened a jira for this issue: HADOOP-11886
>
> 2015-04-27 16:36 GMT+08:00 sam liu <sa...@gmail.com>:
>
>> Hi Experts,
>>
>> It is really weird that DistCp could successfully get the file from
>> FileZilla ftp server on Windows7, but failed from the IIS ftp server on the
>> same Windows7 OS(but I can get file using wget directly: 'wget
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt' ). I tried several
>> times, but all failed and encountered different error messages as below.
>>
>> Any comments?
>>
>> *[Success on FileZilla ftp server on Windows7]:*
>> [hdfs@hostname2.com ~]$ hadoop distcp
>> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt /tmp/
>> 15/04/26 22:56:20 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt], targetPath=/tmp,
>> targetPathExists=true, preserveRawXattrs=false}
>> 15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address:
>> http://hostname2.com:8188/ws/v1/timeline/
>> 15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at
>> hostname2.com/9.32.249.181:8050
>> 15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address:
>> http://hostname2.com:8188/ws/v1/timeline/
>> 15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at
>> hostname2.com/9.32.249.181:8050
>> 15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
>> 15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1429858372957_0002
>> 15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application
>> application_1429858372957_0002
>> 15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job:
>> http://hostname2.com:8088/proxy/application_1429858372957_0002/
>> 15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
>> 15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
>> 15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running
>> in uber mode : false
>> 15/04/26 22:56:51 INFO mapreduce.Job: map 0% reduce 0%
>>
>> *[Failure 1 on IIS ftp server on the same Windows7 OS] :*
>> [hdfs@hostname2.com ~]$ hadoop distcp
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
>> 15/04/27 00:02:45 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
>> targetPathExists=true, preserveRawXattrs=false}
>> 15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address:
>> http://hostname2.com:8188/ws/v1/timeline/
>> 15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at
>> hostname2.com/9.32.249.181:8050
>> 15/04/27 00:03:50 ERROR tools.DistCp: Invalid input:
>> org.apache.hadoop.tools.CopyListing$InvalidInputException:
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt doesn't exist
>> at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
>> at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>> at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>>
>> *[Failure 2 on IIS ftp server on the same Windows7 OS] :*
>> [biadmin@hostname2.com ~]$ hadoop distcp
>> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt /tmp/
>> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt], targetPath=/tmp,
>> targetPathExists=true}
>> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
>> hostname2.com/9.32.249.181:8032
>> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
>> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection
>> closed without indication.
>> at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>> at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>> at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>> at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>> at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>> at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>> at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>> at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>> at
>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>> at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>> at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>> at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>>
>> *[Failure 3 on IIS ftp server on the same Windows7 OS] :*
>> [hdfs@hostname2.com ~]$ hadoop distcp
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
>> 15/04/27 00:08:18 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
>> targetPathExists=true, preserveRawXattrs=false}
>> 15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address:
>> http://hostname2.com:8188/ws/v1/timeline/
>> 15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at
>> hostname2.com/9.32.249.181:8050
>> 15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered
>> java.net.SocketException: Connection reset
>> at java.net.SocketInputStream.read(SocketInputStream.java:196)
>> at java.net.SocketInputStream.read(SocketInputStream.java:122)
>> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
>> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
>> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
>> at java.io.InputStreamReader.read(InputStreamReader.java:184)
>> at java.io.BufferedReader.fill(BufferedReader.java:154)
>> at java.io.BufferedReader.read(BufferedReader.java:175)
>> at
>> org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
>> at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
>> at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>> at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>> at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>> at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162)
>> at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410)
>> at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>> at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
>> at
>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)
>> at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>> at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>> at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>>
>> Thanks!
>>
>>
>> 2015-02-02 15:41 GMT+08:00 sam liu <sa...@gmail.com>:
>>
>>> Hi Experts,
>>>
>>> I could run distcp against ftp server installed on Linux, but could NOT
>>> run distcp against ftp server installed on Windows. Below are the steps.
>>>
>>> Is this a DistCp bug? Any comments?
>>>
>>> [Scenario 1]
>>> I installed a BI cluster using trunk build on HadoopNode1, and then
>>> could copy file from a ftp installed on Linux to hdfs using command:
>>> hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt
>>> hdfs://HadoopNode1:9000/tmp/
>>>
>>> [Scenario 2]
>>> On the same hadoop node, I can copy file from a remote ftp server
>>> installed on Windows7 using command:
>>> wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.
>>>
>>> But I failed to copy file from a ftp installed on Windows7 to hdfs using
>>> command:
>>> [user1@HadoopNode1 ~]$ hadoop distcp
>>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
>>> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
>>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp,
>>> targetPathExists=true}
>>> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
>>> HadoopNode1/9.30.239.166:8032
>>> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
>>> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection
>>> closed without indication.
>>> at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>>> at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>>> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>>> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>>> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>>> at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>>> at
>>> org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>>> at
>>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>>> at
>>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>>> at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>>> at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>>> at
>>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>>> at
>>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>>> at
>>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>>> at
>>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>>> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>>>
>>> Thanks!
>>>
>>
>>
>