You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by sam liu <sa...@gmail.com> on 2015/04/27 10:36:05 UTC

Re: Failed to run distcp against ftp server installed on Windows.

Hi Experts,

It is really weird that DistCp could successfully get the file from
FileZilla ftp server on Windows7, but failed from the IIS ftp server on the
same Windows7 OS(but I can get file using wget directly: 'wget
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt' ). I tried several
times, but all failed and encountered different error messages as below.

Any comments?

*[Success on FileZilla ftp server on Windows7]:*
[hdfs@hostname2.com ~]$ hadoop distcp
ftp://ftp:ftp@hostname1.com:121/ftp_test.txt /tmp/
15/04/26 22:56:20 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://ftp:ftp@hostname1.com:121/ftp_test.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1429858372957_0002
15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application
application_1429858372957_0002
15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job:
http://hostname2.com:8088/proxy/application_1429858372957_0002/
15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running in
uber mode : false
15/04/26 22:56:51 INFO mapreduce.Job:  map 0% reduce 0%

*[Failure 1 on  IIS ftp server on the same Windows7 OS] :*
[hdfs@hostname2.com ~]$ hadoop distcp
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
15/04/27 00:02:45 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/27 00:03:50 ERROR tools.DistCp: Invalid input:
org.apache.hadoop.tools.CopyListing$InvalidInputException:
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt doesn't exist
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)

*[Failure 2 on  IIS ftp server on the same Windows7 OS] :*
[biadmin@hostname2.com ~]$ hadoop distcp
ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt /tmp/
15/02/01 23:03:37 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt], targetPath=/tmp,
targetPathExists=true}
15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8032
15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
without indication.
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
        at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
        at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)

*[Failure 3 on  IIS ftp server on the same Windows7 OS] :*
[hdfs@hostname2.com ~]$ hadoop distcp
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
15/04/27 00:08:18 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:196)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
        at java.io.InputStreamReader.read(InputStreamReader.java:184)
        at java.io.BufferedReader.fill(BufferedReader.java:154)
        at java.io.BufferedReader.read(BufferedReader.java:175)
        at
org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
        at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
        at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)

Thanks!


2015-02-02 15:41 GMT+08:00 sam liu <sa...@gmail.com>:

> Hi Experts,
>
> I could run distcp against ftp server installed on Linux, but could NOT
> run distcp against ftp server installed on Windows. Below are the steps.
>
> Is this a DistCp bug? Any comments?
>
> [Scenario 1]
> I installed a BI cluster using trunk build on HadoopNode1, and then could
> copy file from a ftp installed on Linux to hdfs using command:
> hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt
> hdfs://HadoopNode1:9000/tmp/
>
> [Scenario 2]
> On the same hadoop node, I can copy file from a remote ftp server
> installed on Windows7 using command:
> wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.
>
> But I failed to copy file from a ftp installed on Windows7 to hdfs using
> command:
> [user1@HadoopNode1 ~]$ hadoop distcp
> ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp,
> targetPathExists=true}
> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
> HadoopNode1/9.30.239.166:8032
> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
> without indication.
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>
> Thanks!
>

Re: Failed to run distcp against ftp server installed on Windows.

Posted by sam liu <sa...@gmail.com>.
Hi Experts,

Is there any comment on this issue?

Thanks!

2015-04-29 10:35 GMT+08:00 sam liu <sa...@gmail.com>:

> for IIS ftp server on Windows, seems the distcp tool always failed on the
> line 'client.setFileTransferMode(FTP.BLOCK_TRANSFER_MODE)' in
> hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ftp/FTPFileSystem.java#connect()
>
> Opened a jira for this issue: HADOOP-11886
>
> 2015-04-27 16:36 GMT+08:00 sam liu <sa...@gmail.com>:
>
>> Hi Experts,
>>
>> It is really weird that DistCp could successfully get the file from
>> FileZilla ftp server on Windows7, but failed from the IIS ftp server on the
>> same Windows7 OS(but I can get file using wget directly: 'wget
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt' ). I tried several
>> times, but all failed and encountered different error messages as below.
>>
>> Any comments?
>>
>> *[Success on FileZilla ftp server on Windows7]:*
>> [hdfs@hostname2.com ~]$ hadoop distcp
>> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt /tmp/
>> 15/04/26 22:56:20 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt], targetPath=/tmp,
>> targetPathExists=true, preserveRawXattrs=false}
>> 15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address:
>> http://hostname2.com:8188/ws/v1/timeline/
>> 15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at
>> hostname2.com/9.32.249.181:8050
>> 15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address:
>> http://hostname2.com:8188/ws/v1/timeline/
>> 15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at
>> hostname2.com/9.32.249.181:8050
>> 15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
>> 15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1429858372957_0002
>> 15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application
>> application_1429858372957_0002
>> 15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job:
>> http://hostname2.com:8088/proxy/application_1429858372957_0002/
>> 15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
>> 15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
>> 15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running
>> in uber mode : false
>> 15/04/26 22:56:51 INFO mapreduce.Job:  map 0% reduce 0%
>>
>> *[Failure 1 on  IIS ftp server on the same Windows7 OS] :*
>> [hdfs@hostname2.com ~]$ hadoop distcp
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
>> 15/04/27 00:02:45 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
>> targetPathExists=true, preserveRawXattrs=false}
>> 15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address:
>> http://hostname2.com:8188/ws/v1/timeline/
>> 15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at
>> hostname2.com/9.32.249.181:8050
>> 15/04/27 00:03:50 ERROR tools.DistCp: Invalid input:
>> org.apache.hadoop.tools.CopyListing$InvalidInputException:
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt doesn't exist
>>         at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
>>         at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>>         at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>>
>> *[Failure 2 on  IIS ftp server on the same Windows7 OS] :*
>> [biadmin@hostname2.com ~]$ hadoop distcp
>> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt /tmp/
>> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt], targetPath=/tmp,
>> targetPathExists=true}
>> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
>> hostname2.com/9.32.249.181:8032
>> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
>> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection
>> closed without indication.
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>>         at
>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>>         at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>>         at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>>         at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>>
>> *[Failure 3 on  IIS ftp server on the same Windows7 OS] :*
>> [hdfs@hostname2.com ~]$ hadoop distcp
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
>> 15/04/27 00:08:18 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
>> targetPathExists=true, preserveRawXattrs=false}
>> 15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address:
>> http://hostname2.com:8188/ws/v1/timeline/
>> 15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at
>> hostname2.com/9.32.249.181:8050
>> 15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered
>> java.net.SocketException: Connection reset
>>         at java.net.SocketInputStream.read(SocketInputStream.java:196)
>>         at java.net.SocketInputStream.read(SocketInputStream.java:122)
>>         at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
>>         at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
>>         at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
>>         at java.io.InputStreamReader.read(InputStreamReader.java:184)
>>         at java.io.BufferedReader.fill(BufferedReader.java:154)
>>         at java.io.BufferedReader.read(BufferedReader.java:175)
>>         at
>> org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410)
>>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>>         at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
>>         at
>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)
>>         at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>>         at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>>         at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>>
>> Thanks!
>>
>>
>> 2015-02-02 15:41 GMT+08:00 sam liu <sa...@gmail.com>:
>>
>>> Hi Experts,
>>>
>>> I could run distcp against ftp server installed on Linux, but could NOT
>>> run distcp against ftp server installed on Windows. Below are the steps.
>>>
>>> Is this a DistCp bug? Any comments?
>>>
>>> [Scenario 1]
>>> I installed a BI cluster using trunk build on HadoopNode1, and then
>>> could copy file from a ftp installed on Linux to hdfs using command:
>>> hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt
>>> hdfs://HadoopNode1:9000/tmp/
>>>
>>> [Scenario 2]
>>> On the same hadoop node, I can copy file from a remote ftp server
>>> installed on Windows7 using command:
>>> wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.
>>>
>>> But I failed to copy file from a ftp installed on Windows7 to hdfs using
>>> command:
>>> [user1@HadoopNode1 ~]$ hadoop distcp
>>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
>>> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
>>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp,
>>> targetPathExists=true}
>>> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
>>> HadoopNode1/9.30.239.166:8032
>>> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
>>> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection
>>> closed without indication.
>>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>>>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>>>         at
>>> org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>>>         at
>>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>>>         at
>>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>>>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>>>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>>>         at
>>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>>>         at
>>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>>>         at
>>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>>>         at
>>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>>>
>>> Thanks!
>>>
>>
>>
>

Re: Failed to run distcp against ftp server installed on Windows.

Posted by sam liu <sa...@gmail.com>.
for IIS ftp server on Windows, seems the distcp tool always failed on the
line 'client.setFileTransferMode(FTP.BLOCK_TRANSFER_MODE)' in
hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ftp/FTPFileSystem.java#connect()

Opened a jira for this issue: HADOOP-11886

2015-04-27 16:36 GMT+08:00 sam liu <sa...@gmail.com>:

> Hi Experts,
>
> It is really weird that DistCp could successfully get the file from
> FileZilla ftp server on Windows7, but failed from the IIS ftp server on the
> same Windows7 OS(but I can get file using wget directly: 'wget
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt' ). I tried several
> times, but all failed and encountered different error messages as below.
>
> Any comments?
>
> *[Success on FileZilla ftp server on Windows7]:*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt /tmp/
> 15/04/26 22:56:20 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
> 15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1429858372957_0002
> 15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application
> application_1429858372957_0002
> 15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job:
> http://hostname2.com:8088/proxy/application_1429858372957_0002/
> 15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
> 15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
> 15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running
> in uber mode : false
> 15/04/26 22:56:51 INFO mapreduce.Job:  map 0% reduce 0%
>
> *[Failure 1 on  IIS ftp server on the same Windows7 OS] :*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
> 15/04/27 00:02:45 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/27 00:03:50 ERROR tools.DistCp: Invalid input:
> org.apache.hadoop.tools.CopyListing$InvalidInputException:
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt doesn't exist
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>
> *[Failure 2 on  IIS ftp server on the same Windows7 OS] :*
> [biadmin@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt /tmp/
> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt], targetPath=/tmp,
> targetPathExists=true}
> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8032
> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
> without indication.
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>
> *[Failure 3 on  IIS ftp server on the same Windows7 OS] :*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
> 15/04/27 00:08:18 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:196)
>         at java.net.SocketInputStream.read(SocketInputStream.java:122)
>         at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
>         at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
>         at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
>         at java.io.InputStreamReader.read(InputStreamReader.java:184)
>         at java.io.BufferedReader.fill(BufferedReader.java:154)
>         at java.io.BufferedReader.read(BufferedReader.java:175)
>         at
> org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>
> Thanks!
>
>
> 2015-02-02 15:41 GMT+08:00 sam liu <sa...@gmail.com>:
>
>> Hi Experts,
>>
>> I could run distcp against ftp server installed on Linux, but could NOT
>> run distcp against ftp server installed on Windows. Below are the steps.
>>
>> Is this a DistCp bug? Any comments?
>>
>> [Scenario 1]
>> I installed a BI cluster using trunk build on HadoopNode1, and then could
>> copy file from a ftp installed on Linux to hdfs using command:
>> hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt
>> hdfs://HadoopNode1:9000/tmp/
>>
>> [Scenario 2]
>> On the same hadoop node, I can copy file from a remote ftp server
>> installed on Windows7 using command:
>> wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.
>>
>> But I failed to copy file from a ftp installed on Windows7 to hdfs using
>> command:
>> [user1@HadoopNode1 ~]$ hadoop distcp
>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
>> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp,
>> targetPathExists=true}
>> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
>> HadoopNode1/9.30.239.166:8032
>> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
>> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection
>> closed without indication.
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>>         at
>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>>         at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>>         at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>>         at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>>
>> Thanks!
>>
>
>

Re: Failed to run distcp against ftp server installed on Windows.

Posted by sam liu <sa...@gmail.com>.
for IIS ftp server on Windows, seems the distcp tool always failed on the
line 'client.setFileTransferMode(FTP.BLOCK_TRANSFER_MODE)' in
hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ftp/FTPFileSystem.java#connect()

Opened a jira for this issue: HADOOP-11886

2015-04-27 16:36 GMT+08:00 sam liu <sa...@gmail.com>:

> Hi Experts,
>
> It is really weird that DistCp could successfully get the file from
> FileZilla ftp server on Windows7, but failed from the IIS ftp server on the
> same Windows7 OS(but I can get file using wget directly: 'wget
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt' ). I tried several
> times, but all failed and encountered different error messages as below.
>
> Any comments?
>
> *[Success on FileZilla ftp server on Windows7]:*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt /tmp/
> 15/04/26 22:56:20 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
> 15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1429858372957_0002
> 15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application
> application_1429858372957_0002
> 15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job:
> http://hostname2.com:8088/proxy/application_1429858372957_0002/
> 15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
> 15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
> 15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running
> in uber mode : false
> 15/04/26 22:56:51 INFO mapreduce.Job:  map 0% reduce 0%
>
> *[Failure 1 on  IIS ftp server on the same Windows7 OS] :*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
> 15/04/27 00:02:45 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/27 00:03:50 ERROR tools.DistCp: Invalid input:
> org.apache.hadoop.tools.CopyListing$InvalidInputException:
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt doesn't exist
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>
> *[Failure 2 on  IIS ftp server on the same Windows7 OS] :*
> [biadmin@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt /tmp/
> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt], targetPath=/tmp,
> targetPathExists=true}
> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8032
> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
> without indication.
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>
> *[Failure 3 on  IIS ftp server on the same Windows7 OS] :*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
> 15/04/27 00:08:18 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:196)
>         at java.net.SocketInputStream.read(SocketInputStream.java:122)
>         at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
>         at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
>         at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
>         at java.io.InputStreamReader.read(InputStreamReader.java:184)
>         at java.io.BufferedReader.fill(BufferedReader.java:154)
>         at java.io.BufferedReader.read(BufferedReader.java:175)
>         at
> org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>
> Thanks!
>
>
> 2015-02-02 15:41 GMT+08:00 sam liu <sa...@gmail.com>:
>
>> Hi Experts,
>>
>> I could run distcp against ftp server installed on Linux, but could NOT
>> run distcp against ftp server installed on Windows. Below are the steps.
>>
>> Is this a DistCp bug? Any comments?
>>
>> [Scenario 1]
>> I installed a BI cluster using trunk build on HadoopNode1, and then could
>> copy file from a ftp installed on Linux to hdfs using command:
>> hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt
>> hdfs://HadoopNode1:9000/tmp/
>>
>> [Scenario 2]
>> On the same hadoop node, I can copy file from a remote ftp server
>> installed on Windows7 using command:
>> wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.
>>
>> But I failed to copy file from a ftp installed on Windows7 to hdfs using
>> command:
>> [user1@HadoopNode1 ~]$ hadoop distcp
>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
>> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp,
>> targetPathExists=true}
>> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
>> HadoopNode1/9.30.239.166:8032
>> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
>> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection
>> closed without indication.
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>>         at
>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>>         at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>>         at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>>         at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>>
>> Thanks!
>>
>
>

Re: Failed to run distcp against ftp server installed on Windows.

Posted by sam liu <sa...@gmail.com>.
for IIS ftp server on Windows, seems the distcp tool always failed on the
line 'client.setFileTransferMode(FTP.BLOCK_TRANSFER_MODE)' in
hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ftp/FTPFileSystem.java#connect()

Opened a jira for this issue: HADOOP-11886

2015-04-27 16:36 GMT+08:00 sam liu <sa...@gmail.com>:

> Hi Experts,
>
> It is really weird that DistCp could successfully get the file from
> FileZilla ftp server on Windows7, but failed from the IIS ftp server on the
> same Windows7 OS(but I can get file using wget directly: 'wget
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt' ). I tried several
> times, but all failed and encountered different error messages as below.
>
> Any comments?
>
> *[Success on FileZilla ftp server on Windows7]:*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt /tmp/
> 15/04/26 22:56:20 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
> 15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1429858372957_0002
> 15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application
> application_1429858372957_0002
> 15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job:
> http://hostname2.com:8088/proxy/application_1429858372957_0002/
> 15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
> 15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
> 15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running
> in uber mode : false
> 15/04/26 22:56:51 INFO mapreduce.Job:  map 0% reduce 0%
>
> *[Failure 1 on  IIS ftp server on the same Windows7 OS] :*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
> 15/04/27 00:02:45 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/27 00:03:50 ERROR tools.DistCp: Invalid input:
> org.apache.hadoop.tools.CopyListing$InvalidInputException:
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt doesn't exist
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>
> *[Failure 2 on  IIS ftp server on the same Windows7 OS] :*
> [biadmin@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt /tmp/
> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt], targetPath=/tmp,
> targetPathExists=true}
> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8032
> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
> without indication.
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>
> *[Failure 3 on  IIS ftp server on the same Windows7 OS] :*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
> 15/04/27 00:08:18 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:196)
>         at java.net.SocketInputStream.read(SocketInputStream.java:122)
>         at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
>         at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
>         at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
>         at java.io.InputStreamReader.read(InputStreamReader.java:184)
>         at java.io.BufferedReader.fill(BufferedReader.java:154)
>         at java.io.BufferedReader.read(BufferedReader.java:175)
>         at
> org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>
> Thanks!
>
>
> 2015-02-02 15:41 GMT+08:00 sam liu <sa...@gmail.com>:
>
>> Hi Experts,
>>
>> I could run distcp against ftp server installed on Linux, but could NOT
>> run distcp against ftp server installed on Windows. Below are the steps.
>>
>> Is this a DistCp bug? Any comments?
>>
>> [Scenario 1]
>> I installed a BI cluster using trunk build on HadoopNode1, and then could
>> copy file from a ftp installed on Linux to hdfs using command:
>> hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt
>> hdfs://HadoopNode1:9000/tmp/
>>
>> [Scenario 2]
>> On the same hadoop node, I can copy file from a remote ftp server
>> installed on Windows7 using command:
>> wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.
>>
>> But I failed to copy file from a ftp installed on Windows7 to hdfs using
>> command:
>> [user1@HadoopNode1 ~]$ hadoop distcp
>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
>> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp,
>> targetPathExists=true}
>> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
>> HadoopNode1/9.30.239.166:8032
>> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
>> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection
>> closed without indication.
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>>         at
>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>>         at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>>         at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>>         at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>>
>> Thanks!
>>
>
>

Re: Failed to run distcp against ftp server installed on Windows.

Posted by sam liu <sa...@gmail.com>.
for IIS ftp server on Windows, seems the distcp tool always failed on the
line 'client.setFileTransferMode(FTP.BLOCK_TRANSFER_MODE)' in
hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ftp/FTPFileSystem.java#connect()

Opened a jira for this issue: HADOOP-11886

2015-04-27 16:36 GMT+08:00 sam liu <sa...@gmail.com>:

> Hi Experts,
>
> It is really weird that DistCp could successfully get the file from
> FileZilla ftp server on Windows7, but failed from the IIS ftp server on the
> same Windows7 OS(but I can get file using wget directly: 'wget
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt' ). I tried several
> times, but all failed and encountered different error messages as below.
>
> Any comments?
>
> *[Success on FileZilla ftp server on Windows7]:*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt /tmp/
> 15/04/26 22:56:20 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
> 15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1429858372957_0002
> 15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application
> application_1429858372957_0002
> 15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job:
> http://hostname2.com:8088/proxy/application_1429858372957_0002/
> 15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
> 15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
> 15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running
> in uber mode : false
> 15/04/26 22:56:51 INFO mapreduce.Job:  map 0% reduce 0%
>
> *[Failure 1 on  IIS ftp server on the same Windows7 OS] :*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
> 15/04/27 00:02:45 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/27 00:03:50 ERROR tools.DistCp: Invalid input:
> org.apache.hadoop.tools.CopyListing$InvalidInputException:
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt doesn't exist
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>
> *[Failure 2 on  IIS ftp server on the same Windows7 OS] :*
> [biadmin@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt /tmp/
> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt], targetPath=/tmp,
> targetPathExists=true}
> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8032
> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
> without indication.
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>
> *[Failure 3 on  IIS ftp server on the same Windows7 OS] :*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
> 15/04/27 00:08:18 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:196)
>         at java.net.SocketInputStream.read(SocketInputStream.java:122)
>         at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
>         at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
>         at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
>         at java.io.InputStreamReader.read(InputStreamReader.java:184)
>         at java.io.BufferedReader.fill(BufferedReader.java:154)
>         at java.io.BufferedReader.read(BufferedReader.java:175)
>         at
> org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>
> Thanks!
>
>
> 2015-02-02 15:41 GMT+08:00 sam liu <sa...@gmail.com>:
>
>> Hi Experts,
>>
>> I could run distcp against ftp server installed on Linux, but could NOT
>> run distcp against ftp server installed on Windows. Below are the steps.
>>
>> Is this a DistCp bug? Any comments?
>>
>> [Scenario 1]
>> I installed a BI cluster using trunk build on HadoopNode1, and then could
>> copy file from a ftp installed on Linux to hdfs using command:
>> hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt
>> hdfs://HadoopNode1:9000/tmp/
>>
>> [Scenario 2]
>> On the same hadoop node, I can copy file from a remote ftp server
>> installed on Windows7 using command:
>> wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.
>>
>> But I failed to copy file from a ftp installed on Windows7 to hdfs using
>> command:
>> [user1@HadoopNode1 ~]$ hadoop distcp
>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
>> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp,
>> targetPathExists=true}
>> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
>> HadoopNode1/9.30.239.166:8032
>> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
>> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection
>> closed without indication.
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>>         at
>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>>         at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>>         at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>>         at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>>
>> Thanks!
>>
>
>