You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Christian Kunz (JIRA)" <ji...@apache.org> on 2008/10/01 01:35:44 UTC

[jira] Created: (HADOOP-4318) distcp fails

distcp fails
------------

                 Key: HADOOP-4318
                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
             Project: Hadoop Core
          Issue Type: Bug
    Affects Versions: 0.17.2
            Reporter: Christian Kunz
            Priority: Blocker
             Fix For: 0.17.3


we run distcp between two clusters running 0.17.2 using hdfs.

As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):

2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)

	at org.apache.hadoop.ipc.Client.call(Client.java:557)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4318) distcp fails

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636266#action_12636266 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4318:
------------------------------------------------

> Don't I need this?

I thought we should delete the tmp file right before copying.  However, there are already some cleanup codes (which has a bug).  So I reverted this change.

> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch, 4318_20081001_0.17b.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4318) distcp fails

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636217#action_12636217 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4318:
------------------------------------------------

> What about the wrong dst file name. Is this something which could be fixed easily?
I found another bug for the wrong dst.  I will upload a new patch soon.

> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4318) distcp fails

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-4318:
-------------------------------------------

    Attachment: 4318_20081002_0.17.patch

4318_20081002_0.17.patch: combined patch for 0.17

> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch, 4318_20081001_0.17b.patch, 4318_20081002_0.17.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4318) distcp fails

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636650#action_12636650 ] 

Hudson commented on HADOOP-4318:
--------------------------------

Integrated in Hadoop-trunk #622 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/622/])
    . DistCp should use absolute paths for cleanup.  (szetszwo)


> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch, 4318_20081001_0.17b.patch, 4318_20081002.patch, 4318_20081002_0.17.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4318) distcp fails

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636493#action_12636493 ] 

Hadoop QA commented on HADOOP-4318:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12391377/4318_20081002.patch
  against trunk revision 700997.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3425/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3425/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3425/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3425/console

This message is automatically generated.

> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch, 4318_20081001_0.17b.patch, 4318_20081002.patch, 4318_20081002_0.17.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-4318) distcp fails

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636270#action_12636270 ] 

ckunz edited comment on HADOOP-4318 at 10/1/08 7:15 PM:
-----------------------------------------------------------------

I would argue that we need both, because a task can get killed in the middle of copying with having no time to cleanup.

      was (Author: ckunz):
    I would argue that we need both, because a task can get killed in the middle of copying with having time to cleanup.
  
> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch, 4318_20081001_0.17b.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4318) distcp fails

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636206#action_12636206 ] 

Christian Kunz commented on HADOOP-4318:
----------------------------------------

.bq By wrong path, are you saying that .../_distcp_tmp_dvml74/3169 and /user/.../3169 are distinct?
Yes, they are distinct.

.bq DistCp first copies src to tmp and then rename tmp to dst. The file it deleting is dst but not tmp.
/user/.../3169 is not dst. It is a different file. The user is correct but the distcp command specified a different dst directory than /user.

> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4318) distcp fails

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636199#action_12636199 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4318:
------------------------------------------------

> but it uses the wong path.
By wrong path, are you saying that .../_distcp_tmp_dvml74/3169 and /user/.../3169 are distinct?

DistCp first copies src to tmp and then rename tmp to dst. The file it deleting is dst but not tmp.

>From the log above, I think the problem is that the map task dies when copying src to tmp and leaves the tmp file open (even there is a close in the codes). So there is an AlreadyBeingCreatedException in the retries.

If it is the case, we should try deleting tmp before copy.

> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Priority: Blocker
>             Fix For: 0.17.3
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4318) distcp fails

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-4318:
-------------------------------------------

    Attachment: 4318_20081001_0.17b.patch

4318_20081001_0.17b.patch: fixed a bug in cleanup

> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch, 4318_20081001_0.17b.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4318) distcp fails

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-4318:
-------------------------------------------

    Attachment: 4318_20081001_0.17.patch

4318_20081001_0.17.patch: delete tmp before copy.

Christian, how could I reproduce this?  Or could you try the patch?

> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4318) distcp fails

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-4318:
-------------------------------------------

    Component/s: tools/distcp
       Assignee: Tsz Wo (Nicholas), SZE

> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch, 4318_20081001_0.17b.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4318) distcp fails

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636106#action_12636106 ] 

Devaraj Das commented on HADOOP-4318:
-------------------------------------

This most likely is related to HADOOP-4264

> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Priority: Blocker
>             Fix For: 0.17.3
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4318) distcp fails

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-4318:
-------------------------------------------

    Attachment: 4318_20081002.patch

4318_20081002.patch: for trunk

No new tests are added since the codes changes are simple and I don't have any idea to write a test.  The test should start a distcp job and then some how make some tasks failing.

> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch, 4318_20081001_0.17b.patch, 4318_20081002.patch, 4318_20081002_0.17.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4318) distcp fails

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636294#action_12636294 ] 

Christian Kunz commented on HADOOP-4318:
----------------------------------------

After applying the patches, distcp worked. Thanks, Nicholas.

For the record, I applied both patches.

> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch, 4318_20081001_0.17b.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4318) distcp fails

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636212#action_12636212 ] 

Christian Kunz commented on HADOOP-4318:
----------------------------------------

Nicholas, I will try your patch in a couple of hours. Thanks.

What about the wrong dst file name. Is this something which could be fixed easily?

> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4318) distcp fails

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636167#action_12636167 ] 

Christian Kunz commented on HADOOP-4318:
----------------------------------------

I would be surprised if this is the same issue.
Part of this issue seems to be that distcp tries to go around the AlreadyBeingCreatedException by deleting the file, but it uses the *wong path*.

>From what I can see a distcp task starts copying into a destination file, fails because of some issue, and the next retries cannot create the destination file because there is still a lease on it. And attempts to delete the file do not succeed because they use the *wrong path* /user/.../3164.

Here is the corresponding log of the namenode:

2008-09-30 22:54:49,429 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.startFile: failed to create file ..._distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file
.
2008-09-30 22:54:49,429 INFO org.apache.hadoop.ipc.Server: IPC Server handler 58 on 8600, call .../_distcp_tmp_dvml74/3169, rw-r--r--, DFSClient_task_200809121811_0034_m_001085_1, true, 3, 134217728) from xxx.yyy.zzz.uuu:36614: error: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file
..._distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.

org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file ..._distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current lease
holder is trying to recreate file.
        at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
        at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
        at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
        at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)

2008-09-30 22:54:49,431 WARN org.apache.hadoop.dfs.StateChange: DIR* FSDirectory.unprotectedDelete: failed to remove */user/.../3169* because it does not exist.


> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Priority: Blocker
>             Fix For: 0.17.3
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4318) distcp fails

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-4318:
-------------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.

> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch, 4318_20081001_0.17b.patch, 4318_20081002.patch, 4318_20081002_0.17.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-4318) distcp fails

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636206#action_12636206 ] 

ckunz edited comment on HADOOP-4318 at 10/1/08 3:21 PM:
-----------------------------------------------------------------

bq. By wrong path, are you saying that .../_distcp_tmp_dvml74/3169 and /user/.../3169 are distinct?
Yes, they are distinct.

bq. DistCp first copies src to tmp and then rename tmp to dst. The file it deleting is dst but not tmp.
/user/.../3169 is not dst. It is a different file. The user is correct but the distcp command specified a different dst directory than /user.

      was (Author: ckunz):
    .bq By wrong path, are you saying that .../_distcp_tmp_dvml74/3169 and /user/.../3169 are distinct?
Yes, they are distinct.

.bq DistCp first copies src to tmp and then rename tmp to dst. The file it deleting is dst but not tmp.
/user/.../3169 is not dst. It is a different file. The user is correct but the distcp command specified a different dst directory than /user.
  
> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4318) distcp fails

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-4318:
-------------------------------------------

    Status: Patch Available  (was: Open)

> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch, 4318_20081001_0.17b.patch, 4318_20081002.patch, 4318_20081002_0.17.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4318) distcp fails

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636254#action_12636254 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4318:
------------------------------------------------

hi Christian, you only need 4318_20081001_0.17b.patch.

> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch, 4318_20081001_0.17b.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4318) distcp fails

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636270#action_12636270 ] 

Christian Kunz commented on HADOOP-4318:
----------------------------------------

I would argue that we need both, because a task can get killed in the middle of copying with having time to cleanup.

> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch, 4318_20081001_0.17b.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4318) distcp fails

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636260#action_12636260 ] 

Christian Kunz commented on HADOOP-4318:
----------------------------------------

I noticed that 4318_20081001_0.17b.patch
did not contain


-304,8 +304,11
     
     private FSDataOutputStream create(Path f, Reporter reporter,
         FileStatus srcstat) throws IOException {
+      if (destFileSys.exists(f)) {
+        destFileSys.delete(f, false);
+      }

Don't I need this? When looking at the source code of startFileInternal I see that the AlreadyBeingCreatedException is thrown before overwrite is checked for deleting the file.


> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch, 4318_20081001_0.17b.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4318) distcp fails

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-4318:
----------------------------------

    Hadoop Flags: [Reviewed]

+1 Looks good

> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch, 4318_20081001_0.17b.patch, 4318_20081002.patch, 4318_20081002_0.17.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4318) distcp fails

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636247#action_12636247 ] 

Christian Kunz commented on HADOOP-4318:
----------------------------------------

I am applying now both patches...

> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch, 4318_20081001_0.17b.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-4318) distcp fails

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636167#action_12636167 ] 

ckunz edited comment on HADOOP-4318 at 10/1/08 3:21 PM:
-----------------------------------------------------------------

I would be surprised if this is the same issue.
Part of this issue seems to be that distcp tries to go around the AlreadyBeingCreatedException by deleting the file, but it uses the *wrong path*.

>From what I can see a distcp task starts copying into a destination file, fails because of some issue, and the next retries cannot create the destination file because there is still a lease on it. And attempts to delete the file do not succeed because they use the *wrong path* /user/.../3164.

Here is the corresponding log of the namenode:

2008-09-30 22:54:49,429 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.startFile: failed to create file ..._distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file
.
2008-09-30 22:54:49,429 INFO org.apache.hadoop.ipc.Server: IPC Server handler 58 on 8600, call .../_distcp_tmp_dvml74/3169, rw-r--r--, DFSClient_task_200809121811_0034_m_001085_1, true, 3, 134217728) from xxx.yyy.zzz.uuu:36614: error: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file
..._distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.

org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file ..._distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current lease
holder is trying to recreate file.
        at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
        at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
        at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
        at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)

2008-09-30 22:54:49,431 WARN org.apache.hadoop.dfs.StateChange: DIR* FSDirectory.unprotectedDelete: failed to remove */user/.../3169* because it does not exist.


      was (Author: ckunz):
    I would be surprised if this is the same issue.
Part of this issue seems to be that distcp tries to go around the AlreadyBeingCreatedException by deleting the file, but it uses the *wong path*.

>From what I can see a distcp task starts copying into a destination file, fails because of some issue, and the next retries cannot create the destination file because there is still a lease on it. And attempts to delete the file do not succeed because they use the *wrong path* /user/.../3164.

Here is the corresponding log of the namenode:

2008-09-30 22:54:49,429 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.startFile: failed to create file ..._distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file
.
2008-09-30 22:54:49,429 INFO org.apache.hadoop.ipc.Server: IPC Server handler 58 on 8600, call .../_distcp_tmp_dvml74/3169, rw-r--r--, DFSClient_task_200809121811_0034_m_001085_1, true, 3, 134217728) from xxx.yyy.zzz.uuu:36614: error: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file
..._distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.

org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file ..._distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current lease
holder is trying to recreate file.
        at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
        at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
        at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
        at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)

2008-09-30 22:54:49,431 WARN org.apache.hadoop.dfs.StateChange: DIR* FSDirectory.unprotectedDelete: failed to remove */user/.../3169* because it does not exist.

  
> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4318) distcp fails

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636409#action_12636409 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4318:
------------------------------------------------

Thanks, Christian.  I will combine two patches unless it conflicts with the original design.

> distcp fails
> ------------
>
>                 Key: HADOOP-4318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4318
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.3
>
>         Attachments: 4318_20081001_0.17.patch, 4318_20081001_0.17b.patch
>
>
> we run distcp between two clusters running 0.17.2 using hdfs.
> As long as one of the tasks fails after opening a file for writing (which typically always happens), subsequent retries will always fail with the following exception (we did not see this with 0.16.3, seems to be a regression):
> 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010)
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269)
> 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:557)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192)
> 	at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
> 	at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.