You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by sam liu <sa...@gmail.com> on 2015/02/02 08:41:53 UTC

Failed to run distcp against ftp server installed on Windows.

Hi Experts,

I could run distcp against ftp server installed on Linux, but could NOT run
distcp against ftp server installed on Windows. Below are the steps.

Is this a DistCp bug? Any comments?

[Scenario 1]
I installed a BI cluster using trunk build on HadoopNode1, and then could
copy file from a ftp installed on Linux to hdfs using command:
hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt
hdfs://HadoopNode1:9000/tmp/

[Scenario 2]
On the same hadoop node, I can copy file from a remote ftp server installed
on Windows7 using command:
wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.

But I failed to copy file from a ftp installed on Windows7 to hdfs using
command:
[user1@HadoopNode1 ~]$ hadoop distcp
ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
15/02/01 23:03:37 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp,
targetPathExists=true}
15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
HadoopNode1/9.30.239.166:8032
15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
without indication.
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
        at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
        at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)

Thanks!

Re: Failed to run distcp against ftp server installed on Windows.

Posted by sam liu <sa...@gmail.com>.
Hi Experts,

Is there any comment on this issue?

Thanks!

2015-04-29 10:35 GMT+08:00 sam liu <sa...@gmail.com>:

> for IIS ftp server on Windows, seems the distcp tool always failed on the
> line 'client.setFileTransferMode(FTP.BLOCK_TRANSFER_MODE)' in
> hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ftp/FTPFileSystem.java#connect()
>
> Opened a jira for this issue: HADOOP-11886
>
> 2015-04-27 16:36 GMT+08:00 sam liu <sa...@gmail.com>:
>
>> Hi Experts,
>>
>> It is really weird that DistCp could successfully get the file from
>> FileZilla ftp server on Windows7, but failed from the IIS ftp server on the
>> same Windows7 OS(but I can get file using wget directly: 'wget
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt' ). I tried several
>> times, but all failed and encountered different error messages as below.
>>
>> Any comments?
>>
>> *[Success on FileZilla ftp server on Windows7]:*
>> [hdfs@hostname2.com ~]$ hadoop distcp
>> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt /tmp/
>> 15/04/26 22:56:20 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt], targetPath=/tmp,
>> targetPathExists=true, preserveRawXattrs=false}
>> 15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address:
>> http://hostname2.com:8188/ws/v1/timeline/
>> 15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at
>> hostname2.com/9.32.249.181:8050
>> 15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address:
>> http://hostname2.com:8188/ws/v1/timeline/
>> 15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at
>> hostname2.com/9.32.249.181:8050
>> 15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
>> 15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1429858372957_0002
>> 15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application
>> application_1429858372957_0002
>> 15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job:
>> http://hostname2.com:8088/proxy/application_1429858372957_0002/
>> 15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
>> 15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
>> 15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running
>> in uber mode : false
>> 15/04/26 22:56:51 INFO mapreduce.Job:  map 0% reduce 0%
>>
>> *[Failure 1 on  IIS ftp server on the same Windows7 OS] :*
>> [hdfs@hostname2.com ~]$ hadoop distcp
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
>> 15/04/27 00:02:45 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
>> targetPathExists=true, preserveRawXattrs=false}
>> 15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address:
>> http://hostname2.com:8188/ws/v1/timeline/
>> 15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at
>> hostname2.com/9.32.249.181:8050
>> 15/04/27 00:03:50 ERROR tools.DistCp: Invalid input:
>> org.apache.hadoop.tools.CopyListing$InvalidInputException:
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt doesn't exist
>>         at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
>>         at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>>         at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>>
>> *[Failure 2 on  IIS ftp server on the same Windows7 OS] :*
>> [biadmin@hostname2.com ~]$ hadoop distcp
>> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt /tmp/
>> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt], targetPath=/tmp,
>> targetPathExists=true}
>> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
>> hostname2.com/9.32.249.181:8032
>> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
>> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection
>> closed without indication.
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>>         at
>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>>         at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>>         at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>>         at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>>
>> *[Failure 3 on  IIS ftp server on the same Windows7 OS] :*
>> [hdfs@hostname2.com ~]$ hadoop distcp
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
>> 15/04/27 00:08:18 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
>> targetPathExists=true, preserveRawXattrs=false}
>> 15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address:
>> http://hostname2.com:8188/ws/v1/timeline/
>> 15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at
>> hostname2.com/9.32.249.181:8050
>> 15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered
>> java.net.SocketException: Connection reset
>>         at java.net.SocketInputStream.read(SocketInputStream.java:196)
>>         at java.net.SocketInputStream.read(SocketInputStream.java:122)
>>         at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
>>         at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
>>         at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
>>         at java.io.InputStreamReader.read(InputStreamReader.java:184)
>>         at java.io.BufferedReader.fill(BufferedReader.java:154)
>>         at java.io.BufferedReader.read(BufferedReader.java:175)
>>         at
>> org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410)
>>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>>         at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
>>         at
>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)
>>         at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>>         at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>>         at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>>
>> Thanks!
>>
>>
>> 2015-02-02 15:41 GMT+08:00 sam liu <sa...@gmail.com>:
>>
>>> Hi Experts,
>>>
>>> I could run distcp against ftp server installed on Linux, but could NOT
>>> run distcp against ftp server installed on Windows. Below are the steps.
>>>
>>> Is this a DistCp bug? Any comments?
>>>
>>> [Scenario 1]
>>> I installed a BI cluster using trunk build on HadoopNode1, and then
>>> could copy file from a ftp installed on Linux to hdfs using command:
>>> hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt
>>> hdfs://HadoopNode1:9000/tmp/
>>>
>>> [Scenario 2]
>>> On the same hadoop node, I can copy file from a remote ftp server
>>> installed on Windows7 using command:
>>> wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.
>>>
>>> But I failed to copy file from a ftp installed on Windows7 to hdfs using
>>> command:
>>> [user1@HadoopNode1 ~]$ hadoop distcp
>>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
>>> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
>>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp,
>>> targetPathExists=true}
>>> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
>>> HadoopNode1/9.30.239.166:8032
>>> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
>>> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection
>>> closed without indication.
>>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>>>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>>>         at
>>> org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>>>         at
>>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>>>         at
>>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>>>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>>>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>>>         at
>>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>>>         at
>>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>>>         at
>>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>>>         at
>>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>>>
>>> Thanks!
>>>
>>
>>
>

Re: Failed to run distcp against ftp server installed on Windows.

Posted by sam liu <sa...@gmail.com>.
for IIS ftp server on Windows, seems the distcp tool always failed on the
line 'client.setFileTransferMode(FTP.BLOCK_TRANSFER_MODE)' in
hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ftp/FTPFileSystem.java#connect()

Opened a jira for this issue: HADOOP-11886

2015-04-27 16:36 GMT+08:00 sam liu <sa...@gmail.com>:

> Hi Experts,
>
> It is really weird that DistCp could successfully get the file from
> FileZilla ftp server on Windows7, but failed from the IIS ftp server on the
> same Windows7 OS(but I can get file using wget directly: 'wget
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt' ). I tried several
> times, but all failed and encountered different error messages as below.
>
> Any comments?
>
> *[Success on FileZilla ftp server on Windows7]:*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt /tmp/
> 15/04/26 22:56:20 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
> 15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1429858372957_0002
> 15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application
> application_1429858372957_0002
> 15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job:
> http://hostname2.com:8088/proxy/application_1429858372957_0002/
> 15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
> 15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
> 15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running
> in uber mode : false
> 15/04/26 22:56:51 INFO mapreduce.Job:  map 0% reduce 0%
>
> *[Failure 1 on  IIS ftp server on the same Windows7 OS] :*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
> 15/04/27 00:02:45 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/27 00:03:50 ERROR tools.DistCp: Invalid input:
> org.apache.hadoop.tools.CopyListing$InvalidInputException:
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt doesn't exist
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>
> *[Failure 2 on  IIS ftp server on the same Windows7 OS] :*
> [biadmin@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt /tmp/
> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt], targetPath=/tmp,
> targetPathExists=true}
> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8032
> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
> without indication.
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>
> *[Failure 3 on  IIS ftp server on the same Windows7 OS] :*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
> 15/04/27 00:08:18 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:196)
>         at java.net.SocketInputStream.read(SocketInputStream.java:122)
>         at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
>         at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
>         at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
>         at java.io.InputStreamReader.read(InputStreamReader.java:184)
>         at java.io.BufferedReader.fill(BufferedReader.java:154)
>         at java.io.BufferedReader.read(BufferedReader.java:175)
>         at
> org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>
> Thanks!
>
>
> 2015-02-02 15:41 GMT+08:00 sam liu <sa...@gmail.com>:
>
>> Hi Experts,
>>
>> I could run distcp against ftp server installed on Linux, but could NOT
>> run distcp against ftp server installed on Windows. Below are the steps.
>>
>> Is this a DistCp bug? Any comments?
>>
>> [Scenario 1]
>> I installed a BI cluster using trunk build on HadoopNode1, and then could
>> copy file from a ftp installed on Linux to hdfs using command:
>> hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt
>> hdfs://HadoopNode1:9000/tmp/
>>
>> [Scenario 2]
>> On the same hadoop node, I can copy file from a remote ftp server
>> installed on Windows7 using command:
>> wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.
>>
>> But I failed to copy file from a ftp installed on Windows7 to hdfs using
>> command:
>> [user1@HadoopNode1 ~]$ hadoop distcp
>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
>> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp,
>> targetPathExists=true}
>> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
>> HadoopNode1/9.30.239.166:8032
>> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
>> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection
>> closed without indication.
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>>         at
>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>>         at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>>         at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>>         at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>>
>> Thanks!
>>
>
>

Re: Failed to run distcp against ftp server installed on Windows.

Posted by sam liu <sa...@gmail.com>.
for IIS ftp server on Windows, seems the distcp tool always failed on the
line 'client.setFileTransferMode(FTP.BLOCK_TRANSFER_MODE)' in
hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ftp/FTPFileSystem.java#connect()

Opened a jira for this issue: HADOOP-11886

2015-04-27 16:36 GMT+08:00 sam liu <sa...@gmail.com>:

> Hi Experts,
>
> It is really weird that DistCp could successfully get the file from
> FileZilla ftp server on Windows7, but failed from the IIS ftp server on the
> same Windows7 OS(but I can get file using wget directly: 'wget
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt' ). I tried several
> times, but all failed and encountered different error messages as below.
>
> Any comments?
>
> *[Success on FileZilla ftp server on Windows7]:*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt /tmp/
> 15/04/26 22:56:20 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
> 15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1429858372957_0002
> 15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application
> application_1429858372957_0002
> 15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job:
> http://hostname2.com:8088/proxy/application_1429858372957_0002/
> 15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
> 15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
> 15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running
> in uber mode : false
> 15/04/26 22:56:51 INFO mapreduce.Job:  map 0% reduce 0%
>
> *[Failure 1 on  IIS ftp server on the same Windows7 OS] :*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
> 15/04/27 00:02:45 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/27 00:03:50 ERROR tools.DistCp: Invalid input:
> org.apache.hadoop.tools.CopyListing$InvalidInputException:
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt doesn't exist
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>
> *[Failure 2 on  IIS ftp server on the same Windows7 OS] :*
> [biadmin@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt /tmp/
> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt], targetPath=/tmp,
> targetPathExists=true}
> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8032
> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
> without indication.
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>
> *[Failure 3 on  IIS ftp server on the same Windows7 OS] :*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
> 15/04/27 00:08:18 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:196)
>         at java.net.SocketInputStream.read(SocketInputStream.java:122)
>         at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
>         at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
>         at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
>         at java.io.InputStreamReader.read(InputStreamReader.java:184)
>         at java.io.BufferedReader.fill(BufferedReader.java:154)
>         at java.io.BufferedReader.read(BufferedReader.java:175)
>         at
> org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>
> Thanks!
>
>
> 2015-02-02 15:41 GMT+08:00 sam liu <sa...@gmail.com>:
>
>> Hi Experts,
>>
>> I could run distcp against ftp server installed on Linux, but could NOT
>> run distcp against ftp server installed on Windows. Below are the steps.
>>
>> Is this a DistCp bug? Any comments?
>>
>> [Scenario 1]
>> I installed a BI cluster using trunk build on HadoopNode1, and then could
>> copy file from a ftp installed on Linux to hdfs using command:
>> hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt
>> hdfs://HadoopNode1:9000/tmp/
>>
>> [Scenario 2]
>> On the same hadoop node, I can copy file from a remote ftp server
>> installed on Windows7 using command:
>> wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.
>>
>> But I failed to copy file from a ftp installed on Windows7 to hdfs using
>> command:
>> [user1@HadoopNode1 ~]$ hadoop distcp
>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
>> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp,
>> targetPathExists=true}
>> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
>> HadoopNode1/9.30.239.166:8032
>> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
>> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection
>> closed without indication.
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>>         at
>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>>         at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>>         at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>>         at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>>
>> Thanks!
>>
>
>

Re: Failed to run distcp against ftp server installed on Windows.

Posted by sam liu <sa...@gmail.com>.
for IIS ftp server on Windows, seems the distcp tool always failed on the
line 'client.setFileTransferMode(FTP.BLOCK_TRANSFER_MODE)' in
hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ftp/FTPFileSystem.java#connect()

Opened a jira for this issue: HADOOP-11886

2015-04-27 16:36 GMT+08:00 sam liu <sa...@gmail.com>:

> Hi Experts,
>
> It is really weird that DistCp could successfully get the file from
> FileZilla ftp server on Windows7, but failed from the IIS ftp server on the
> same Windows7 OS(but I can get file using wget directly: 'wget
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt' ). I tried several
> times, but all failed and encountered different error messages as below.
>
> Any comments?
>
> *[Success on FileZilla ftp server on Windows7]:*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt /tmp/
> 15/04/26 22:56:20 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
> 15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1429858372957_0002
> 15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application
> application_1429858372957_0002
> 15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job:
> http://hostname2.com:8088/proxy/application_1429858372957_0002/
> 15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
> 15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
> 15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running
> in uber mode : false
> 15/04/26 22:56:51 INFO mapreduce.Job:  map 0% reduce 0%
>
> *[Failure 1 on  IIS ftp server on the same Windows7 OS] :*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
> 15/04/27 00:02:45 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/27 00:03:50 ERROR tools.DistCp: Invalid input:
> org.apache.hadoop.tools.CopyListing$InvalidInputException:
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt doesn't exist
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>
> *[Failure 2 on  IIS ftp server on the same Windows7 OS] :*
> [biadmin@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt /tmp/
> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt], targetPath=/tmp,
> targetPathExists=true}
> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8032
> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
> without indication.
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>
> *[Failure 3 on  IIS ftp server on the same Windows7 OS] :*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
> 15/04/27 00:08:18 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:196)
>         at java.net.SocketInputStream.read(SocketInputStream.java:122)
>         at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
>         at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
>         at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
>         at java.io.InputStreamReader.read(InputStreamReader.java:184)
>         at java.io.BufferedReader.fill(BufferedReader.java:154)
>         at java.io.BufferedReader.read(BufferedReader.java:175)
>         at
> org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>
> Thanks!
>
>
> 2015-02-02 15:41 GMT+08:00 sam liu <sa...@gmail.com>:
>
>> Hi Experts,
>>
>> I could run distcp against ftp server installed on Linux, but could NOT
>> run distcp against ftp server installed on Windows. Below are the steps.
>>
>> Is this a DistCp bug? Any comments?
>>
>> [Scenario 1]
>> I installed a BI cluster using trunk build on HadoopNode1, and then could
>> copy file from a ftp installed on Linux to hdfs using command:
>> hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt
>> hdfs://HadoopNode1:9000/tmp/
>>
>> [Scenario 2]
>> On the same hadoop node, I can copy file from a remote ftp server
>> installed on Windows7 using command:
>> wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.
>>
>> But I failed to copy file from a ftp installed on Windows7 to hdfs using
>> command:
>> [user1@HadoopNode1 ~]$ hadoop distcp
>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
>> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp,
>> targetPathExists=true}
>> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
>> HadoopNode1/9.30.239.166:8032
>> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
>> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection
>> closed without indication.
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>>         at
>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>>         at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>>         at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>>         at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>>
>> Thanks!
>>
>
>

Re: Failed to run distcp against ftp server installed on Windows.

Posted by sam liu <sa...@gmail.com>.
for IIS ftp server on Windows, seems the distcp tool always failed on the
line 'client.setFileTransferMode(FTP.BLOCK_TRANSFER_MODE)' in
hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ftp/FTPFileSystem.java#connect()

Opened a jira for this issue: HADOOP-11886

2015-04-27 16:36 GMT+08:00 sam liu <sa...@gmail.com>:

> Hi Experts,
>
> It is really weird that DistCp could successfully get the file from
> FileZilla ftp server on Windows7, but failed from the IIS ftp server on the
> same Windows7 OS(but I can get file using wget directly: 'wget
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt' ). I tried several
> times, but all failed and encountered different error messages as below.
>
> Any comments?
>
> *[Success on FileZilla ftp server on Windows7]:*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt /tmp/
> 15/04/26 22:56:20 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
> 15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1429858372957_0002
> 15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application
> application_1429858372957_0002
> 15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job:
> http://hostname2.com:8088/proxy/application_1429858372957_0002/
> 15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
> 15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
> 15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running
> in uber mode : false
> 15/04/26 22:56:51 INFO mapreduce.Job:  map 0% reduce 0%
>
> *[Failure 1 on  IIS ftp server on the same Windows7 OS] :*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
> 15/04/27 00:02:45 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/27 00:03:50 ERROR tools.DistCp: Invalid input:
> org.apache.hadoop.tools.CopyListing$InvalidInputException:
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt doesn't exist
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>
> *[Failure 2 on  IIS ftp server on the same Windows7 OS] :*
> [biadmin@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt /tmp/
> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt], targetPath=/tmp,
> targetPathExists=true}
> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8032
> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
> without indication.
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>
> *[Failure 3 on  IIS ftp server on the same Windows7 OS] :*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
> 15/04/27 00:08:18 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:196)
>         at java.net.SocketInputStream.read(SocketInputStream.java:122)
>         at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
>         at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
>         at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
>         at java.io.InputStreamReader.read(InputStreamReader.java:184)
>         at java.io.BufferedReader.fill(BufferedReader.java:154)
>         at java.io.BufferedReader.read(BufferedReader.java:175)
>         at
> org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>
> Thanks!
>
>
> 2015-02-02 15:41 GMT+08:00 sam liu <sa...@gmail.com>:
>
>> Hi Experts,
>>
>> I could run distcp against ftp server installed on Linux, but could NOT
>> run distcp against ftp server installed on Windows. Below are the steps.
>>
>> Is this a DistCp bug? Any comments?
>>
>> [Scenario 1]
>> I installed a BI cluster using trunk build on HadoopNode1, and then could
>> copy file from a ftp installed on Linux to hdfs using command:
>> hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt
>> hdfs://HadoopNode1:9000/tmp/
>>
>> [Scenario 2]
>> On the same hadoop node, I can copy file from a remote ftp server
>> installed on Windows7 using command:
>> wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.
>>
>> But I failed to copy file from a ftp installed on Windows7 to hdfs using
>> command:
>> [user1@HadoopNode1 ~]$ hadoop distcp
>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
>> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp,
>> targetPathExists=true}
>> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
>> HadoopNode1/9.30.239.166:8032
>> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
>> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection
>> closed without indication.
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>>         at
>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>>         at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>>         at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>>         at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>>
>> Thanks!
>>
>
>

Re: Failed to run distcp against ftp server installed on Windows.

Posted by sam liu <sa...@gmail.com>.
Hi Experts,

It is really weird that DistCp could successfully get the file from
FileZilla ftp server on Windows7, but failed from the IIS ftp server on the
same Windows7 OS(but I can get file using wget directly: 'wget
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt' ). I tried several
times, but all failed and encountered different error messages as below.

Any comments?

*[Success on FileZilla ftp server on Windows7]:*
[hdfs@hostname2.com ~]$ hadoop distcp
ftp://ftp:ftp@hostname1.com:121/ftp_test.txt /tmp/
15/04/26 22:56:20 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://ftp:ftp@hostname1.com:121/ftp_test.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1429858372957_0002
15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application
application_1429858372957_0002
15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job:
http://hostname2.com:8088/proxy/application_1429858372957_0002/
15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running in
uber mode : false
15/04/26 22:56:51 INFO mapreduce.Job:  map 0% reduce 0%

*[Failure 1 on  IIS ftp server on the same Windows7 OS] :*
[hdfs@hostname2.com ~]$ hadoop distcp
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
15/04/27 00:02:45 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/27 00:03:50 ERROR tools.DistCp: Invalid input:
org.apache.hadoop.tools.CopyListing$InvalidInputException:
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt doesn't exist
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)

*[Failure 2 on  IIS ftp server on the same Windows7 OS] :*
[biadmin@hostname2.com ~]$ hadoop distcp
ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt /tmp/
15/02/01 23:03:37 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt], targetPath=/tmp,
targetPathExists=true}
15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8032
15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
without indication.
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
        at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
        at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)

*[Failure 3 on  IIS ftp server on the same Windows7 OS] :*
[hdfs@hostname2.com ~]$ hadoop distcp
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
15/04/27 00:08:18 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:196)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
        at java.io.InputStreamReader.read(InputStreamReader.java:184)
        at java.io.BufferedReader.fill(BufferedReader.java:154)
        at java.io.BufferedReader.read(BufferedReader.java:175)
        at
org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
        at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
        at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)

Thanks!


2015-02-02 15:41 GMT+08:00 sam liu <sa...@gmail.com>:

> Hi Experts,
>
> I could run distcp against ftp server installed on Linux, but could NOT
> run distcp against ftp server installed on Windows. Below are the steps.
>
> Is this a DistCp bug? Any comments?
>
> [Scenario 1]
> I installed a BI cluster using trunk build on HadoopNode1, and then could
> copy file from a ftp installed on Linux to hdfs using command:
> hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt
> hdfs://HadoopNode1:9000/tmp/
>
> [Scenario 2]
> On the same hadoop node, I can copy file from a remote ftp server
> installed on Windows7 using command:
> wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.
>
> But I failed to copy file from a ftp installed on Windows7 to hdfs using
> command:
> [user1@HadoopNode1 ~]$ hadoop distcp
> ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp,
> targetPathExists=true}
> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
> HadoopNode1/9.30.239.166:8032
> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
> without indication.
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>
> Thanks!
>

Re: Failed to run distcp against ftp server installed on Windows.

Posted by sam liu <sa...@gmail.com>.
Hi Experts,

It is really weird that DistCp could successfully get the file from
FileZilla ftp server on Windows7, but failed from the IIS ftp server on the
same Windows7 OS(but I can get file using wget directly: 'wget
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt' ). I tried several
times, but all failed and encountered different error messages as below.

Any comments?

*[Success on FileZilla ftp server on Windows7]:*
[hdfs@hostname2.com ~]$ hadoop distcp
ftp://ftp:ftp@hostname1.com:121/ftp_test.txt /tmp/
15/04/26 22:56:20 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://ftp:ftp@hostname1.com:121/ftp_test.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1429858372957_0002
15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application
application_1429858372957_0002
15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job:
http://hostname2.com:8088/proxy/application_1429858372957_0002/
15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running in
uber mode : false
15/04/26 22:56:51 INFO mapreduce.Job:  map 0% reduce 0%

*[Failure 1 on  IIS ftp server on the same Windows7 OS] :*
[hdfs@hostname2.com ~]$ hadoop distcp
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
15/04/27 00:02:45 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/27 00:03:50 ERROR tools.DistCp: Invalid input:
org.apache.hadoop.tools.CopyListing$InvalidInputException:
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt doesn't exist
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)

*[Failure 2 on  IIS ftp server on the same Windows7 OS] :*
[biadmin@hostname2.com ~]$ hadoop distcp
ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt /tmp/
15/02/01 23:03:37 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt], targetPath=/tmp,
targetPathExists=true}
15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8032
15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
without indication.
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
        at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
        at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)

*[Failure 3 on  IIS ftp server on the same Windows7 OS] :*
[hdfs@hostname2.com ~]$ hadoop distcp
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
15/04/27 00:08:18 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:196)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
        at java.io.InputStreamReader.read(InputStreamReader.java:184)
        at java.io.BufferedReader.fill(BufferedReader.java:154)
        at java.io.BufferedReader.read(BufferedReader.java:175)
        at
org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
        at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
        at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)

Thanks!


2015-02-02 15:41 GMT+08:00 sam liu <sa...@gmail.com>:

> Hi Experts,
>
> I could run distcp against ftp server installed on Linux, but could NOT
> run distcp against ftp server installed on Windows. Below are the steps.
>
> Is this a DistCp bug? Any comments?
>
> [Scenario 1]
> I installed a BI cluster using trunk build on HadoopNode1, and then could
> copy file from a ftp installed on Linux to hdfs using command:
> hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt
> hdfs://HadoopNode1:9000/tmp/
>
> [Scenario 2]
> On the same hadoop node, I can copy file from a remote ftp server
> installed on Windows7 using command:
> wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.
>
> But I failed to copy file from a ftp installed on Windows7 to hdfs using
> command:
> [user1@HadoopNode1 ~]$ hadoop distcp
> ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp,
> targetPathExists=true}
> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
> HadoopNode1/9.30.239.166:8032
> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
> without indication.
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>
> Thanks!
>

Re: Failed to run distcp against ftp server installed on Windows.

Posted by sam liu <sa...@gmail.com>.
Hi Experts,

It is really weird that DistCp could successfully get the file from
FileZilla ftp server on Windows7, but failed from the IIS ftp server on the
same Windows7 OS(but I can get file using wget directly: 'wget
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt' ). I tried several
times, but all failed and encountered different error messages as below.

Any comments?

*[Success on FileZilla ftp server on Windows7]:*
[hdfs@hostname2.com ~]$ hadoop distcp
ftp://ftp:ftp@hostname1.com:121/ftp_test.txt /tmp/
15/04/26 22:56:20 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://ftp:ftp@hostname1.com:121/ftp_test.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1429858372957_0002
15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application
application_1429858372957_0002
15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job:
http://hostname2.com:8088/proxy/application_1429858372957_0002/
15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running in
uber mode : false
15/04/26 22:56:51 INFO mapreduce.Job:  map 0% reduce 0%

*[Failure 1 on  IIS ftp server on the same Windows7 OS] :*
[hdfs@hostname2.com ~]$ hadoop distcp
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
15/04/27 00:02:45 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/27 00:03:50 ERROR tools.DistCp: Invalid input:
org.apache.hadoop.tools.CopyListing$InvalidInputException:
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt doesn't exist
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)

*[Failure 2 on  IIS ftp server on the same Windows7 OS] :*
[biadmin@hostname2.com ~]$ hadoop distcp
ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt /tmp/
15/02/01 23:03:37 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt], targetPath=/tmp,
targetPathExists=true}
15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8032
15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
without indication.
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
        at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
        at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)

*[Failure 3 on  IIS ftp server on the same Windows7 OS] :*
[hdfs@hostname2.com ~]$ hadoop distcp
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
15/04/27 00:08:18 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:196)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
        at java.io.InputStreamReader.read(InputStreamReader.java:184)
        at java.io.BufferedReader.fill(BufferedReader.java:154)
        at java.io.BufferedReader.read(BufferedReader.java:175)
        at
org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
        at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
        at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)

Thanks!


2015-02-02 15:41 GMT+08:00 sam liu <sa...@gmail.com>:

> Hi Experts,
>
> I could run distcp against ftp server installed on Linux, but could NOT
> run distcp against ftp server installed on Windows. Below are the steps.
>
> Is this a DistCp bug? Any comments?
>
> [Scenario 1]
> I installed a BI cluster using trunk build on HadoopNode1, and then could
> copy file from a ftp installed on Linux to hdfs using command:
> hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt
> hdfs://HadoopNode1:9000/tmp/
>
> [Scenario 2]
> On the same hadoop node, I can copy file from a remote ftp server
> installed on Windows7 using command:
> wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.
>
> But I failed to copy file from a ftp installed on Windows7 to hdfs using
> command:
> [user1@HadoopNode1 ~]$ hadoop distcp
> ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp,
> targetPathExists=true}
> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
> HadoopNode1/9.30.239.166:8032
> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
> without indication.
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>
> Thanks!
>

Re: Failed to run distcp against ftp server installed on Windows.

Posted by sam liu <sa...@gmail.com>.
Hi Experts,

It is really weird that DistCp could successfully get the file from
FileZilla ftp server on Windows7, but failed from the IIS ftp server on the
same Windows7 OS(but I can get file using wget directly: 'wget
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt' ). I tried several
times, but all failed and encountered different error messages as below.

Any comments?

*[Success on FileZilla ftp server on Windows7]:*
[hdfs@hostname2.com ~]$ hadoop distcp
ftp://ftp:ftp@hostname1.com:121/ftp_test.txt /tmp/
15/04/26 22:56:20 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://ftp:ftp@hostname1.com:121/ftp_test.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1429858372957_0002
15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application
application_1429858372957_0002
15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job:
http://hostname2.com:8088/proxy/application_1429858372957_0002/
15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running in
uber mode : false
15/04/26 22:56:51 INFO mapreduce.Job:  map 0% reduce 0%

*[Failure 1 on  IIS ftp server on the same Windows7 OS] :*
[hdfs@hostname2.com ~]$ hadoop distcp
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
15/04/27 00:02:45 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/27 00:03:50 ERROR tools.DistCp: Invalid input:
org.apache.hadoop.tools.CopyListing$InvalidInputException:
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt doesn't exist
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)

*[Failure 2 on  IIS ftp server on the same Windows7 OS] :*
[biadmin@hostname2.com ~]$ hadoop distcp
ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt /tmp/
15/02/01 23:03:37 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt], targetPath=/tmp,
targetPathExists=true}
15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8032
15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
without indication.
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
        at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
        at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)

*[Failure 3 on  IIS ftp server on the same Windows7 OS] :*
[hdfs@hostname2.com ~]$ hadoop distcp
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
15/04/27 00:08:18 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:196)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
        at java.io.InputStreamReader.read(InputStreamReader.java:184)
        at java.io.BufferedReader.fill(BufferedReader.java:154)
        at java.io.BufferedReader.read(BufferedReader.java:175)
        at
org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
        at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
        at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)

Thanks!


2015-02-02 15:41 GMT+08:00 sam liu <sa...@gmail.com>:

> Hi Experts,
>
> I could run distcp against ftp server installed on Linux, but could NOT
> run distcp against ftp server installed on Windows. Below are the steps.
>
> Is this a DistCp bug? Any comments?
>
> [Scenario 1]
> I installed a BI cluster using trunk build on HadoopNode1, and then could
> copy file from a ftp installed on Linux to hdfs using command:
> hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt
> hdfs://HadoopNode1:9000/tmp/
>
> [Scenario 2]
> On the same hadoop node, I can copy file from a remote ftp server
> installed on Windows7 using command:
> wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.
>
> But I failed to copy file from a ftp installed on Windows7 to hdfs using
> command:
> [user1@HadoopNode1 ~]$ hadoop distcp
> ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp,
> targetPathExists=true}
> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
> HadoopNode1/9.30.239.166:8032
> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
> without indication.
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>
> Thanks!
>