You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by sam liu <sa...@gmail.com> on 2015/05/04 04:08:06 UTC

Re: Failed to run distcp against ftp server installed on Windows.

Hi Experts,

Is there any comment on this issue?

Thanks!

2015-04-29 10:35 GMT+08:00 sam liu <sa...@gmail.com>:

> for IIS ftp server on Windows, seems the distcp tool always failed on the
> line 'client.setFileTransferMode(FTP.BLOCK_TRANSFER_MODE)' in
> hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ftp/FTPFileSystem.java#connect()
>
> Opened a jira for this issue: HADOOP-11886
>
> 2015-04-27 16:36 GMT+08:00 sam liu <sa...@gmail.com>:
>
>> Hi Experts,
>>
>> It is really weird that DistCp could successfully get the file from
>> FileZilla ftp server on Windows7, but failed from the IIS ftp server on the
>> same Windows7 OS(but I can get file using wget directly: 'wget
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt' ). I tried several
>> times, but all failed and encountered different error messages as below.
>>
>> Any comments?
>>
>> *[Success on FileZilla ftp server on Windows7]:*
>> [hdfs@hostname2.com ~]$ hadoop distcp
>> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt /tmp/
>> 15/04/26 22:56:20 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt], targetPath=/tmp,
>> targetPathExists=true, preserveRawXattrs=false}
>> 15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address:
>> http://hostname2.com:8188/ws/v1/timeline/
>> 15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at
>> hostname2.com/9.32.249.181:8050
>> 15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address:
>> http://hostname2.com:8188/ws/v1/timeline/
>> 15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at
>> hostname2.com/9.32.249.181:8050
>> 15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
>> 15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1429858372957_0002
>> 15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application
>> application_1429858372957_0002
>> 15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job:
>> http://hostname2.com:8088/proxy/application_1429858372957_0002/
>> 15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
>> 15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
>> 15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running
>> in uber mode : false
>> 15/04/26 22:56:51 INFO mapreduce.Job:  map 0% reduce 0%
>>
>> *[Failure 1 on  IIS ftp server on the same Windows7 OS] :*
>> [hdfs@hostname2.com ~]$ hadoop distcp
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
>> 15/04/27 00:02:45 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
>> targetPathExists=true, preserveRawXattrs=false}
>> 15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address:
>> http://hostname2.com:8188/ws/v1/timeline/
>> 15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at
>> hostname2.com/9.32.249.181:8050
>> 15/04/27 00:03:50 ERROR tools.DistCp: Invalid input:
>> org.apache.hadoop.tools.CopyListing$InvalidInputException:
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt doesn't exist
>>         at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
>>         at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>>         at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>>
>> *[Failure 2 on  IIS ftp server on the same Windows7 OS] :*
>> [biadmin@hostname2.com ~]$ hadoop distcp
>> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt /tmp/
>> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt], targetPath=/tmp,
>> targetPathExists=true}
>> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
>> hostname2.com/9.32.249.181:8032
>> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
>> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection
>> closed without indication.
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>>         at
>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>>         at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>>         at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>>         at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>>
>> *[Failure 3 on  IIS ftp server on the same Windows7 OS] :*
>> [hdfs@hostname2.com ~]$ hadoop distcp
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
>> 15/04/27 00:08:18 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
>> targetPathExists=true, preserveRawXattrs=false}
>> 15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address:
>> http://hostname2.com:8188/ws/v1/timeline/
>> 15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at
>> hostname2.com/9.32.249.181:8050
>> 15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered
>> java.net.SocketException: Connection reset
>>         at java.net.SocketInputStream.read(SocketInputStream.java:196)
>>         at java.net.SocketInputStream.read(SocketInputStream.java:122)
>>         at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
>>         at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
>>         at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
>>         at java.io.InputStreamReader.read(InputStreamReader.java:184)
>>         at java.io.BufferedReader.fill(BufferedReader.java:154)
>>         at java.io.BufferedReader.read(BufferedReader.java:175)
>>         at
>> org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410)
>>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>>         at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
>>         at
>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)
>>         at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>>         at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>>         at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>>
>> Thanks!
>>
>>
>> 2015-02-02 15:41 GMT+08:00 sam liu <sa...@gmail.com>:
>>
>>> Hi Experts,
>>>
>>> I could run distcp against ftp server installed on Linux, but could NOT
>>> run distcp against ftp server installed on Windows. Below are the steps.
>>>
>>> Is this a DistCp bug? Any comments?
>>>
>>> [Scenario 1]
>>> I installed a BI cluster using trunk build on HadoopNode1, and then
>>> could copy file from a ftp installed on Linux to hdfs using command:
>>> hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt
>>> hdfs://HadoopNode1:9000/tmp/
>>>
>>> [Scenario 2]
>>> On the same hadoop node, I can copy file from a remote ftp server
>>> installed on Windows7 using command:
>>> wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.
>>>
>>> But I failed to copy file from a ftp installed on Windows7 to hdfs using
>>> command:
>>> [user1@HadoopNode1 ~]$ hadoop distcp
>>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
>>> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
>>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp,
>>> targetPathExists=true}
>>> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
>>> HadoopNode1/9.30.239.166:8032
>>> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
>>> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection
>>> closed without indication.
>>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>>>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>>>         at
>>> org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>>>         at
>>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>>>         at
>>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>>>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>>>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>>>         at
>>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>>>         at
>>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>>>         at
>>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>>>         at
>>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>>>
>>> Thanks!
>>>
>>
>>
>