You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by sam liu <sa...@gmail.com> on 2013/04/22 10:45:39 UTC

Why failed to use Distcp over FTP protocol?

Hi Experts,

I failed to execute following command, does not Distcp support FTP protocol?

hadoop distcp ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/file1.txt
hdfs:///tmp/file1.txt

Thanks!

Re: Why failed to use Distcp over FTP protocol?

Posted by sam liu <sa...@gmail.com>.
If I execute 'hadoop distcp hdfs:///tmp/test1.txt
ftp://ftpuser:ftpuser@hostname/tmp/', the exception will be:
attempt_201304222240_0006_m_000000_1: log4j:ERROR Could not connect to
remote log4j server at [localhost]. We will try again later.
13/04/23 19:31:33 INFO mapred.JobClient: Task Id :
attempt_201304222240_0006_m_000000_2, Status : FAILED
java.io.IOException: Cannot rename parent(source):
ftp://ftpuser:ftpuser@hostname/tmp/_distcp_logs_o6gzfy/_temporary/_attempt_201304222240_0006_m_000000_2,
parent(destination):
ftp://ftpuser:ftpuser@bdvm104.svl.ibm.com/tmp/_distcp_logs_o6gzfy
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.rename(FTPFileSystem.java:547)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.rename(FTPFileSystem.java:512)
        at
org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:154)
        at
org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:172)
        at
org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:132)
        at
org.apache.hadoop.mapred.OutputCommitter.commitTask(OutputCommitter.java:221)
        at org.apache.hadoop.mapred.Task.commit(Task.java:1019)
        at org.apache.hadoop.mapred.Task.done(Task.java:889)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:373)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at
java.security.AccessController.doPrivileged(AccessController.java:310)
        at javax.security.auth.Subject.doAs(Subject.java:573)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)



2013/4/24 sam liu <sa...@gmail.com>

> Now,  I can successfully run "hadoop distcp ftp://ftpuser:ftpuser@hostname/tmp/test1.txt
> hdfs:///tmp/test1.txt"
>
> But failed on "hadoop distcp hdfs:///tmp/test1.txt
> ftp://ftpuser:ftpuser@hostname/tmp/test1.txt.v1", it returns issue like:
> attempt_201304222240_0005_m_000000_1: log4j:ERROR Could not connect to
> remote log4j server at [localhost]. We will try again later.
> 13/04/23 18:59:05 INFO mapred.JobClient: Task Id :
> attempt_201304222240_0005_m_000000_2, Status : FAILED
> java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
>         at
> org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:582)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
>
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>         at
> java.security.AccessController.doPrivileged(AccessController.java:310)
>         at javax.security.auth.Subject.doAs(Subject.java:573)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
>
> 2013/4/24 sam liu <sa...@gmail.com>
>
>> I can success execute "hadoop fs -ls ftp://hadoopadm:xxxxxxxx@ftphostname<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>",
>> it returns the root path of linux system.
>>
>> But failed to execute "hadoop fs -rm
>> ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here", and it returns:
>> rm: Delete failed ftp://hadoopadm:xxxxxxxx<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>
>> @ftphostname/some/path/here<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>
>>
>>
>> 2013/4/24 Daryn Sharp <da...@yahoo-inc.com>
>>
>>>  The ftp fs is listing the contents of the given path's parent
>>> directory, and then trying to match the basename of each child path
>>> returned against the basename of the given path – quite inefficient…  The
>>> FNF is it didn't find a match for the basename.  It may be that the ftp
>>> server isn't returning a listing in exactly the expected format so it's
>>> being parsed incorrectly.
>>>
>>>  Does "hadoop fs -ls ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here"
>>> work?  Or "hadoop fs -rm
>>> ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here"?  Those cmds
>>> should exercise the same code paths where you are experiencing errors.
>>>
>>>  Daryn
>>>
>>>  On Apr 22, 2013, at 9:06 PM, sam liu wrote:
>>>
>>>  I encountered IOException and FileNotFoundException:
>>>
>>> 13/04/17 17:11:10 INFO mapred.JobClient: Task Id :
>>> attempt_201304160910_2135_m_
>>> 000000_0, Status : FAILED
>>> java.io.IOException: The temporary job-output directory
>>> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporarydoesn't exist!
>>>     at
>>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>>>     at
>>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
>>>     at
>>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>>>     at
>>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>>     at
>>> java.security.AccessController.doPrivileged(AccessController.java:310)
>>>     at javax.security.auth.Subject.doAs(Subject.java:573)
>>>     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
>>>     at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>>
>>>
>>> ... ...
>>>
>>> 13/04/17 17:11:42 INFO mapred.JobClient: Job complete:
>>> job_201304160910_2135
>>> 13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
>>> 13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all
>>> reduces waiting after reserving slots (ms)=0
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all
>>> maps waiting after reserving slots (ms)=0
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
>>> 13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map
>>> Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
>>> task_201304160910_2135_m_000000
>>> With failures, global counters are inaccurate; consider running with -i
>>> Copy failed: java.io.FileNotFoundException: File
>>> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_tmp_i74spu does not
>>> exist.
>>>     at
>>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
>>>     at
>>> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
>>>     at
>>> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
>>>     at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
>>>     at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
>>>     at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>     at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>>
>>>
>>> 2013/4/23 sam liu <sa...@gmail.com>
>>>
>>>> I encountered IOException and FileNotFoundException:
>>>>
>>>> 13/04/17 17:11:10 INFO mapred.JobClient: Task Id :
>>>> attempt_201304160910_2135_m_000000_0, Status : FAILED
>>>> java.io.IOException: The temporary job-output directory
>>>> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporarydoesn't exist!
>>>>     at
>>>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>>>>     at
>>>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
>>>>     at
>>>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>>>>     at
>>>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>>>     at
>>>> java.security.AccessController.doPrivileged(AccessController.java:310)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:573)
>>>>     at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>>>
>>>>
>>>> ... ...
>>>>
>>>> 13/04/17 17:11:42 INFO mapred.JobClient: Job complete:
>>>> job_201304160910_2135
>>>> 13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
>>>> 13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
>>>> 13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
>>>> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
>>>> 13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
>>>> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all
>>>> reduces waiting after reserving slots (ms)=0
>>>> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all
>>>> maps waiting after reserving slots (ms)=0
>>>> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
>>>> 13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map
>>>> Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
>>>> task_201304160910_2135_m_000000
>>>> With failures, global counters are inaccurate; consider running with -i
>>>> Copy failed: java.io.FileNotFoundException: File
>>>> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_tmp_i74spu does not
>>>> exist.
>>>>     at
>>>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
>>>>     at
>>>> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
>>>>     at
>>>> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
>>>>     at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
>>>>     at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
>>>>     at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>>     at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>>>
>>>>
>>>> 2013/4/23 Daryn Sharp <da...@yahoo-inc.com>
>>>>
>>>>> I believe it should work…  What error message did you receive?
>>>>>
>>>>> Daryn
>>>>>
>>>>> On Apr 22, 2013, at 3:45 AM, sam liu wrote:
>>>>>
>>>>> > Hi Experts,
>>>>> >
>>>>> > I failed to execute following command, does not Distcp support FTP
>>>>> protocol?
>>>>> >
>>>>> > hadoop distcp ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/file1.txt
>>>>> > hdfs:///tmp/file1.txt
>>>>> >
>>>>> > Thanks!
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>

Re: Why failed to use Distcp over FTP protocol?

Posted by sam liu <sa...@gmail.com>.
Now,  I can successfully run "hadoop distcp
ftp://ftpuser:ftpuser@hostname/tmp/test1.txt
hdfs:///tmp/test1.txt"

But failed on "hadoop distcp hdfs:///tmp/test1.txt
ftp://ftpuser:ftpuser@hostname/tmp/test1.txt.v1", it returns issue like:
attempt_201304222240_0005_m_000000_1: log4j:ERROR Could not connect to
remote log4j server at [localhost]. We will try again later.
13/04/23 18:59:05 INFO mapred.JobClient: Task Id :
attempt_201304222240_0005_m_000000_2, Status : FAILED
java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
        at
org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:582)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at
java.security.AccessController.doPrivileged(AccessController.java:310)
        at javax.security.auth.Subject.doAs(Subject.java:573)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)


2013/4/24 sam liu <sa...@gmail.com>

> I can success execute "hadoop fs -ls ftp://hadoopadm:xxxxxxxx@ftphostname<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>",
> it returns the root path of linux system.
>
> But failed to execute "hadoop fs -rm
> ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here", and it returns:
> rm: Delete failed ftp://hadoopadm:xxxxxxxx<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>
> @ftphostname/some/path/here<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>
>
>
> 2013/4/24 Daryn Sharp <da...@yahoo-inc.com>
>
>>  The ftp fs is listing the contents of the given path's parent directory,
>> and then trying to match the basename of each child path returned against
>> the basename of the given path – quite inefficient…  The FNF is it didn't
>> find a match for the basename.  It may be that the ftp server isn't
>> returning a listing in exactly the expected format so it's being parsed
>> incorrectly.
>>
>>  Does "hadoop fs -ls ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here"
>> work?  Or "hadoop fs -rm
>> ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here"?  Those cmds should
>> exercise the same code paths where you are experiencing errors.
>>
>>  Daryn
>>
>>  On Apr 22, 2013, at 9:06 PM, sam liu wrote:
>>
>>  I encountered IOException and FileNotFoundException:
>>
>> 13/04/17 17:11:10 INFO mapred.JobClient: Task Id :
>> attempt_201304160910_2135_m_
>> 000000_0, Status : FAILED
>> java.io.IOException: The temporary job-output directory
>> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporarydoesn't exist!
>>     at
>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>>     at
>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
>>     at
>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>>     at
>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>     at
>> java.security.AccessController.doPrivileged(AccessController.java:310)
>>     at javax.security.auth.Subject.doAs(Subject.java:573)
>>     at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>
>>
>> ... ...
>>
>> 13/04/17 17:11:42 INFO mapred.JobClient: Job complete:
>> job_201304160910_2135
>> 13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
>> 13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
>> 13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
>> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
>> 13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
>> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all
>> reduces waiting after reserving slots (ms)=0
>> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all maps
>> waiting after reserving slots (ms)=0
>> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
>> 13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map
>> Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
>> task_201304160910_2135_m_000000
>> With failures, global counters are inaccurate; consider running with -i
>> Copy failed: java.io.FileNotFoundException: File
>> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_tmp_i74spu does not
>> exist.
>>     at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
>>     at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
>>     at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
>>     at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
>>     at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
>>     at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>     at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>
>>
>> 2013/4/23 sam liu <sa...@gmail.com>
>>
>>> I encountered IOException and FileNotFoundException:
>>>
>>> 13/04/17 17:11:10 INFO mapred.JobClient: Task Id :
>>> attempt_201304160910_2135_m_000000_0, Status : FAILED
>>> java.io.IOException: The temporary job-output directory
>>> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporarydoesn't exist!
>>>     at
>>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>>>     at
>>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
>>>     at
>>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>>>     at
>>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>>     at
>>> java.security.AccessController.doPrivileged(AccessController.java:310)
>>>     at javax.security.auth.Subject.doAs(Subject.java:573)
>>>     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
>>>     at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>>
>>>
>>> ... ...
>>>
>>> 13/04/17 17:11:42 INFO mapred.JobClient: Job complete:
>>> job_201304160910_2135
>>> 13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
>>> 13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all
>>> reduces waiting after reserving slots (ms)=0
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all
>>> maps waiting after reserving slots (ms)=0
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
>>> 13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map
>>> Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
>>> task_201304160910_2135_m_000000
>>> With failures, global counters are inaccurate; consider running with -i
>>> Copy failed: java.io.FileNotFoundException: File
>>> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_tmp_i74spu does not
>>> exist.
>>>     at
>>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
>>>     at
>>> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
>>>     at
>>> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
>>>     at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
>>>     at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
>>>     at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>     at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>>
>>>
>>> 2013/4/23 Daryn Sharp <da...@yahoo-inc.com>
>>>
>>>> I believe it should work…  What error message did you receive?
>>>>
>>>> Daryn
>>>>
>>>> On Apr 22, 2013, at 3:45 AM, sam liu wrote:
>>>>
>>>> > Hi Experts,
>>>> >
>>>> > I failed to execute following command, does not Distcp support FTP
>>>> protocol?
>>>> >
>>>> > hadoop distcp ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/file1.txt
>>>> > hdfs:///tmp/file1.txt
>>>> >
>>>> > Thanks!
>>>>
>>>>
>>>
>>
>>
>

Re: Why failed to use Distcp over FTP protocol?

Posted by Daryn Sharp <da...@yahoo-inc.com>.
While I know a lot about the FileSystem design, this is my first foray in FTPFileSystem so I don't anything about the assumptions or requirements for the ftp server.  The looks like HDFS delete may be returning false when the file doesn't exist, whereas FTP is more appropriately throwing an exception.  If true, that behavior would explain why ftp isn't working.

It's also hard to tell exactly what's failing because the exception is being generated in a "finally" block, which may be obscuring another exception.

Daryn

On Apr 24, 2013, at 9:37 PM, sam liu wrote:

I could execute:
- hadoop fs -ls ftp://ftpuser:ftpuser@hostname/tmp/testdir
- hadoop fs -lsr ftp://ftpuser:ftpuser@hostname/tmp/testdir

Is there any special requirement to ftp configurations for running distcp tool? In my env, if issue 'hadoop fs -lsr ftp://ftpuser:ftpuser@hostname', it will return the root path of my linux file system.


2013/4/24 Daryn Sharp <da...@yahoo-inc.com>>
Listing the root is a bit of a special case that is different than N-many directories deep.  Can you list ftp://hadoopadm:xxxxxxxx@ftphostname/some/dir/file or ftp://hadoopadm:xxxxxxxx@ftphostname/some/dir?  I suspect ftp fs has a bug, so they will fail too.

On Apr 23, 2013, at 8:03 PM, sam liu wrote:

I can success execute "hadoop fs -ls ftp://hadoopadm:xxxxxxxx@ftphostname<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>", it returns the root path of linux system.

But failed to execute "hadoop fs -rm ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here", and it returns:
rm: Delete failed ftp://hadoopadm:xxxxxxxx<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>@ftphostname/some/path/here<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>


2013/4/24 Daryn Sharp <da...@yahoo-inc.com>>
The ftp fs is listing the contents of the given path's parent directory, and then trying to match the basename of each child path returned against the basename of the given path – quite inefficient…  The FNF is it didn't find a match for the basename.  It may be that the ftp server isn't returning a listing in exactly the expected format so it's being parsed incorrectly.

Does "hadoop fs -ls ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here" work?  Or "hadoop fs -rm ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here"?  Those cmds should exercise the same code paths where you are experiencing errors.

Daryn

On Apr 22, 2013, at 9:06 PM, sam liu wrote:

I encountered IOException and FileNotFoundException:

13/04/17 17:11:10 INFO mapred.JobClient: Task Id : attempt_201304160910_2135_m_
000000_0, Status : FAILED
java.io.IOException: The temporary job-output directory ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporary doesn't exist!
    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(AccessController.java:310)
    at javax.security.auth.Subject.doAs(Subject.java:573)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)


... ...

13/04/17 17:11:42 INFO mapred.JobClient: Job complete: job_201304160910_2135
13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201304160910_2135_m_000000
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.FileNotFoundException: File ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_tmp_i74spu does not exist.
    at org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
    at org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
    at org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
    at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)


2013/4/23 sam liu <sa...@gmail.com>>
I encountered IOException and FileNotFoundException:

13/04/17 17:11:10 INFO mapred.JobClient: Task Id : attempt_201304160910_2135_m_000000_0, Status : FAILED
java.io.IOException: The temporary job-output directory ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporary doesn't exist!
    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(AccessController.java:310)
    at javax.security.auth.Subject.doAs(Subject.java:573)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)


... ...

13/04/17 17:11:42 INFO mapred.JobClient: Job complete: job_201304160910_2135
13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201304160910_2135_m_000000
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.FileNotFoundException: File ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_tmp_i74spu does not exist.
    at org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
    at org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
    at org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
    at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)


2013/4/23 Daryn Sharp <da...@yahoo-inc.com>>
I believe it should work…  What error message did you receive?

Daryn

On Apr 22, 2013, at 3:45 AM, sam liu wrote:

> Hi Experts,
>
> I failed to execute following command, does not Distcp support FTP protocol?
>
> hadoop distcp ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/file1.txt
> hdfs:///tmp/file1.txt
>
> Thanks!









Re: Why failed to use Distcp over FTP protocol?

Posted by sam liu <sa...@gmail.com>.
I could execute:
- hadoop fs -ls ftp://ftpuser:ftpuser@hostname/tmp/testdir
- hadoop fs -lsr ftp://ftpuser:ftpuser@hostname/tmp/testdir

Is there any special requirement to ftp configurations for running distcp
tool? In my env, if issue 'hadoop fs -lsr ftp://ftpuser:ftpuser@hostname',
it will return the root path of my linux file system.


2013/4/24 Daryn Sharp <da...@yahoo-inc.com>

>  Listing the root is a bit of a special case that is different than N-many
> directories deep.  Can you list
> ftp://hadoopadm:xxxxxxxx@ftphostname/some/dir/file or
> ftp://hadoopadm:xxxxxxxx@ftphostname/some/dir?  I suspect ftp fs has a
> bug, so they will fail too.
>
>  On Apr 23, 2013, at 8:03 PM, sam liu wrote:
>
>  I can success execute "hadoop fs -ls ftp://hadoopadm:xxxxxxxx@ftphostname<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>",
> it returns the root path of linux system.
>
> But failed to execute "hadoop fs -rm
> ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here", and it returns:
> rm: Delete failed ftp://hadoopadm:xxxxxxxx<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>
> @ftphostname/some/path/here<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>
>
>
> 2013/4/24 Daryn Sharp <da...@yahoo-inc.com>
>
>> The ftp fs is listing the contents of the given path's parent directory,
>> and then trying to match the basename of each child path returned against
>> the basename of the given path – quite inefficient…  The FNF is it didn't
>> find a match for the basename.  It may be that the ftp server isn't
>> returning a listing in exactly the expected format so it's being parsed
>> incorrectly.
>>
>>  Does "hadoop fs -ls ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here"
>> work?  Or "hadoop fs -rm
>> ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here"?  Those cmds should
>> exercise the same code paths where you are experiencing errors.
>>
>>  Daryn
>>
>>  On Apr 22, 2013, at 9:06 PM, sam liu wrote:
>>
>>  I encountered IOException and FileNotFoundException:
>>
>> 13/04/17 17:11:10 INFO mapred.JobClient: Task Id :
>> attempt_201304160910_2135_m_
>> 000000_0, Status : FAILED
>> java.io.IOException: The temporary job-output directory
>> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporarydoesn't exist!
>>     at
>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>>     at
>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
>>     at
>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>>     at
>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>     at
>> java.security.AccessController.doPrivileged(AccessController.java:310)
>>     at javax.security.auth.Subject.doAs(Subject.java:573)
>>     at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>
>>
>> ... ...
>>
>> 13/04/17 17:11:42 INFO mapred.JobClient: Job complete:
>> job_201304160910_2135
>> 13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
>> 13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
>> 13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
>> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
>> 13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
>> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all
>> reduces waiting after reserving slots (ms)=0
>> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all maps
>> waiting after reserving slots (ms)=0
>> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
>> 13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map
>> Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
>> task_201304160910_2135_m_000000
>> With failures, global counters are inaccurate; consider running with -i
>> Copy failed: java.io.FileNotFoundException: File
>> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_tmp_i74spu does not
>> exist.
>>     at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
>>     at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
>>     at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
>>     at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
>>     at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
>>     at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>     at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>
>>
>> 2013/4/23 sam liu <sa...@gmail.com>
>>
>>> I encountered IOException and FileNotFoundException:
>>>
>>> 13/04/17 17:11:10 INFO mapred.JobClient: Task Id :
>>> attempt_201304160910_2135_m_000000_0, Status : FAILED
>>> java.io.IOException: The temporary job-output directory
>>> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporarydoesn't exist!
>>>     at
>>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>>>     at
>>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
>>>     at
>>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>>>     at
>>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>>     at
>>> java.security.AccessController.doPrivileged(AccessController.java:310)
>>>     at javax.security.auth.Subject.doAs(Subject.java:573)
>>>     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
>>>     at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>>
>>>
>>> ... ...
>>>
>>> 13/04/17 17:11:42 INFO mapred.JobClient: Job complete:
>>> job_201304160910_2135
>>> 13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
>>> 13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all
>>> reduces waiting after reserving slots (ms)=0
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all
>>> maps waiting after reserving slots (ms)=0
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
>>> 13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map
>>> Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
>>> task_201304160910_2135_m_000000
>>> With failures, global counters are inaccurate; consider running with -i
>>> Copy failed: java.io.FileNotFoundException: File
>>> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_tmp_i74spu does not
>>> exist.
>>>     at
>>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
>>>     at
>>> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
>>>     at
>>> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
>>>     at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
>>>     at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
>>>     at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>     at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>>
>>>
>>> 2013/4/23 Daryn Sharp <da...@yahoo-inc.com>
>>>
>>>> I believe it should work…  What error message did you receive?
>>>>
>>>> Daryn
>>>>
>>>> On Apr 22, 2013, at 3:45 AM, sam liu wrote:
>>>>
>>>> > Hi Experts,
>>>> >
>>>> > I failed to execute following command, does not Distcp support FTP
>>>> protocol?
>>>> >
>>>> > hadoop distcp ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/file1.txt
>>>> > hdfs:///tmp/file1.txt
>>>> >
>>>> > Thanks!
>>>>
>>>>
>>>
>>
>>
>
>

Re: Why failed to use Distcp over FTP protocol?

Posted by Daryn Sharp <da...@yahoo-inc.com>.
Listing the root is a bit of a special case that is different than N-many directories deep.  Can you list ftp://hadoopadm:xxxxxxxx@ftphostname/some/dir/file or ftp://hadoopadm:xxxxxxxx@ftphostname/some/dir?  I suspect ftp fs has a bug, so they will fail too.

On Apr 23, 2013, at 8:03 PM, sam liu wrote:

I can success execute "hadoop fs -ls ftp://hadoopadm:xxxxxxxx@ftphostname<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>", it returns the root path of linux system.

But failed to execute "hadoop fs -rm ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here", and it returns:
rm: Delete failed ftp://hadoopadm:xxxxxxxx<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>@ftphostname/some/path/here<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>


2013/4/24 Daryn Sharp <da...@yahoo-inc.com>>
The ftp fs is listing the contents of the given path's parent directory, and then trying to match the basename of each child path returned against the basename of the given path – quite inefficient…  The FNF is it didn't find a match for the basename.  It may be that the ftp server isn't returning a listing in exactly the expected format so it's being parsed incorrectly.

Does "hadoop fs -ls ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here" work?  Or "hadoop fs -rm ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here"?  Those cmds should exercise the same code paths where you are experiencing errors.

Daryn

On Apr 22, 2013, at 9:06 PM, sam liu wrote:

I encountered IOException and FileNotFoundException:

13/04/17 17:11:10 INFO mapred.JobClient: Task Id : attempt_201304160910_2135_m_
000000_0, Status : FAILED
java.io.IOException: The temporary job-output directory ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporary doesn't exist!
    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(AccessController.java:310)
    at javax.security.auth.Subject.doAs(Subject.java:573)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)


... ...

13/04/17 17:11:42 INFO mapred.JobClient: Job complete: job_201304160910_2135
13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201304160910_2135_m_000000
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.FileNotFoundException: File ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_tmp_i74spu does not exist.
    at org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
    at org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
    at org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
    at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)


2013/4/23 sam liu <sa...@gmail.com>>
I encountered IOException and FileNotFoundException:

13/04/17 17:11:10 INFO mapred.JobClient: Task Id : attempt_201304160910_2135_m_000000_0, Status : FAILED
java.io.IOException: The temporary job-output directory ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporary doesn't exist!
    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(AccessController.java:310)
    at javax.security.auth.Subject.doAs(Subject.java:573)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)


... ...

13/04/17 17:11:42 INFO mapred.JobClient: Job complete: job_201304160910_2135
13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201304160910_2135_m_000000
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.FileNotFoundException: File ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_tmp_i74spu does not exist.
    at org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
    at org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
    at org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
    at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)


2013/4/23 Daryn Sharp <da...@yahoo-inc.com>>
I believe it should work…  What error message did you receive?

Daryn

On Apr 22, 2013, at 3:45 AM, sam liu wrote:

> Hi Experts,
>
> I failed to execute following command, does not Distcp support FTP protocol?
>
> hadoop distcp ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/file1.txt
> hdfs:///tmp/file1.txt
>
> Thanks!







Re: Why failed to use Distcp over FTP protocol?

Posted by sam liu <sa...@gmail.com>.
I can success execute "hadoop fs -ls
ftp://hadoopadm:xxxxxxxx@ftphostname<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>",
it returns the root path of linux system.

But failed to execute "hadoop fs -rm
ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here", and it returns:
rm: Delete failed
ftp://hadoopadm:xxxxxxxx<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>
@ftphostname/some/path/here<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>


2013/4/24 Daryn Sharp <da...@yahoo-inc.com>

>  The ftp fs is listing the contents of the given path's parent directory,
> and then trying to match the basename of each child path returned against
> the basename of the given path – quite inefficient…  The FNF is it didn't
> find a match for the basename.  It may be that the ftp server isn't
> returning a listing in exactly the expected format so it's being parsed
> incorrectly.
>
>  Does "hadoop fs -ls ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here"
> work?  Or "hadoop fs -rm
> ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here"?  Those cmds should
> exercise the same code paths where you are experiencing errors.
>
>  Daryn
>
>  On Apr 22, 2013, at 9:06 PM, sam liu wrote:
>
>  I encountered IOException and FileNotFoundException:
>
> 13/04/17 17:11:10 INFO mapred.JobClient: Task Id :
> attempt_201304160910_2135_m_
> 000000_0, Status : FAILED
> java.io.IOException: The temporary job-output directory
> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporarydoesn't exist!
>     at
> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>     at
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
>     at
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>     at
> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>     at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>     at
> java.security.AccessController.doPrivileged(AccessController.java:310)
>     at javax.security.auth.Subject.doAs(Subject.java:573)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
>     at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
>
> ... ...
>
> 13/04/17 17:11:42 INFO mapred.JobClient: Job complete:
> job_201304160910_2135
> 13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
> 13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
> 13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
> 13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
> 13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map Tasks
> exceeded allowed limit. FailedCount: 1. LastFailedTask:
> task_201304160910_2135_m_000000
> With failures, global counters are inaccurate; consider running with -i
> Copy failed: java.io.FileNotFoundException: File
> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_tmp_i74spu does not
> exist.
>     at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
>     at
> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
>     at
> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
>     at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
>     at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
>     at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>     at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>
>
> 2013/4/23 sam liu <sa...@gmail.com>
>
>> I encountered IOException and FileNotFoundException:
>>
>> 13/04/17 17:11:10 INFO mapred.JobClient: Task Id :
>> attempt_201304160910_2135_m_000000_0, Status : FAILED
>> java.io.IOException: The temporary job-output directory
>> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporarydoesn't exist!
>>     at
>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>>     at
>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
>>     at
>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>>     at
>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>     at
>> java.security.AccessController.doPrivileged(AccessController.java:310)
>>     at javax.security.auth.Subject.doAs(Subject.java:573)
>>     at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>
>>
>> ... ...
>>
>> 13/04/17 17:11:42 INFO mapred.JobClient: Job complete:
>> job_201304160910_2135
>> 13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
>> 13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
>> 13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
>> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
>> 13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
>> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all
>> reduces waiting after reserving slots (ms)=0
>> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all maps
>> waiting after reserving slots (ms)=0
>> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
>> 13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map
>> Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
>> task_201304160910_2135_m_000000
>> With failures, global counters are inaccurate; consider running with -i
>> Copy failed: java.io.FileNotFoundException: File
>> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_tmp_i74spu does not
>> exist.
>>     at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
>>     at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
>>     at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
>>     at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
>>     at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
>>     at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>     at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>
>>
>> 2013/4/23 Daryn Sharp <da...@yahoo-inc.com>
>>
>>> I believe it should work…  What error message did you receive?
>>>
>>> Daryn
>>>
>>> On Apr 22, 2013, at 3:45 AM, sam liu wrote:
>>>
>>> > Hi Experts,
>>> >
>>> > I failed to execute following command, does not Distcp support FTP
>>> protocol?
>>> >
>>> > hadoop distcp ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/file1.txt
>>> > hdfs:///tmp/file1.txt
>>> >
>>> > Thanks!
>>>
>>>
>>
>
>

Re: Why failed to use Distcp over FTP protocol?

Posted by Daryn Sharp <da...@yahoo-inc.com>.
The ftp fs is listing the contents of the given path's parent directory, and then trying to match the basename of each child path returned against the basename of the given path – quite inefficient…  The FNF is it didn't find a match for the basename.  It may be that the ftp server isn't returning a listing in exactly the expected format so it's being parsed incorrectly.

Does "hadoop fs -ls ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here" work?  Or "hadoop fs -rm ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here"?  Those cmds should exercise the same code paths where you are experiencing errors.

Daryn

On Apr 22, 2013, at 9:06 PM, sam liu wrote:

I encountered IOException and FileNotFoundException:

13/04/17 17:11:10 INFO mapred.JobClient: Task Id : attempt_201304160910_2135_m_
000000_0, Status : FAILED
java.io.IOException: The temporary job-output directory ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporary doesn't exist!
    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(AccessController.java:310)
    at javax.security.auth.Subject.doAs(Subject.java:573)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)


... ...

13/04/17 17:11:42 INFO mapred.JobClient: Job complete: job_201304160910_2135
13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201304160910_2135_m_000000
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.FileNotFoundException: File ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_tmp_i74spu does not exist.
    at org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
    at org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
    at org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
    at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)


2013/4/23 sam liu <sa...@gmail.com>>
I encountered IOException and FileNotFoundException:

13/04/17 17:11:10 INFO mapred.JobClient: Task Id : attempt_201304160910_2135_m_000000_0, Status : FAILED
java.io.IOException: The temporary job-output directory ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporary doesn't exist!
    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(AccessController.java:310)
    at javax.security.auth.Subject.doAs(Subject.java:573)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)


... ...

13/04/17 17:11:42 INFO mapred.JobClient: Job complete: job_201304160910_2135
13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201304160910_2135_m_000000
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.FileNotFoundException: File ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_tmp_i74spu does not exist.
    at org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
    at org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
    at org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
    at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)


2013/4/23 Daryn Sharp <da...@yahoo-inc.com>>
I believe it should work…  What error message did you receive?

Daryn

On Apr 22, 2013, at 3:45 AM, sam liu wrote:

> Hi Experts,
>
> I failed to execute following command, does not Distcp support FTP protocol?
>
> hadoop distcp ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/file1.txt
> hdfs:///tmp/file1.txt
>
> Thanks!





Re: Why failed to use Distcp over FTP protocol?

Posted by sam liu <sa...@gmail.com>.
I encountered IOException and FileNotFoundException:

13/04/17 17:11:10 INFO mapred.JobClient: Task Id :
attempt_201304160910_2135_m_
000000_0, Status : FAILED
java.io.IOException: The temporary job-output directory
ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporary
doesn't exist!
    at
org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
    at
org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
    at
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
    at
org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at
java.security.AccessController.doPrivileged(AccessController.java:310)
    at javax.security.auth.Subject.doAs(Subject.java:573)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)


... ...

13/04/17 17:11:42 INFO mapred.JobClient: Job complete: job_201304160910_2135
13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map Tasks
exceeded allowed limit. FailedCount: 1. LastFailedTask:
task_201304160910_2135_m_000000
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.FileNotFoundException: File
ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_tmp_i74spu does not exist.
    at
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
    at org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
    at org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
    at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)


2013/4/23 sam liu <sa...@gmail.com>

> I encountered IOException and FileNotFoundException:
>
> 13/04/17 17:11:10 INFO mapred.JobClient: Task Id :
> attempt_201304160910_2135_m_000000_0, Status : FAILED
> java.io.IOException: The temporary job-output directory
> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporary
> doesn't exist!
>     at
> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>     at
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
>     at
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>     at
> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>     at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>     at
> java.security.AccessController.doPrivileged(AccessController.java:310)
>     at javax.security.auth.Subject.doAs(Subject.java:573)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
>     at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
>
> ... ...
>
> 13/04/17 17:11:42 INFO mapred.JobClient: Job complete:
> job_201304160910_2135
> 13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
> 13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
> 13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
> 13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
> 13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map Tasks
> exceeded allowed limit. FailedCount: 1. LastFailedTask:
> task_201304160910_2135_m_000000
> With failures, global counters are inaccurate; consider running with -i
> Copy failed: java.io.FileNotFoundException: File
> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_tmp_i74spu does not
> exist.
>     at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
>     at
> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
>     at
> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
>     at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
>     at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
>     at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>     at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>
>
> 2013/4/23 Daryn Sharp <da...@yahoo-inc.com>
>
>> I believe it should work…  What error message did you receive?
>>
>> Daryn
>>
>> On Apr 22, 2013, at 3:45 AM, sam liu wrote:
>>
>> > Hi Experts,
>> >
>> > I failed to execute following command, does not Distcp support FTP
>> protocol?
>> >
>> > hadoop distcp ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/file1.txt
>> > hdfs:///tmp/file1.txt
>> >
>> > Thanks!
>>
>>
>

Re: Why failed to use Distcp over FTP protocol?

Posted by Daryn Sharp <da...@yahoo-inc.com>.
I believe it should work…  What error message did you receive?

Daryn
 
On Apr 22, 2013, at 3:45 AM, sam liu wrote:

> Hi Experts,
> 
> I failed to execute following command, does not Distcp support FTP protocol?
> 
> hadoop distcp ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/file1.txt
> hdfs:///tmp/file1.txt
> 
> Thanks!