You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Gautam Kowshik (JIRA)" <ji...@apache.org> on 2007/10/22 17:04:51 UTC

[jira] Created: (HADOOP-2087) Errors for subsequent requests for file creation after original DFSClient goes down..

Errors for subsequent requests for file creation after original DFSClient goes down..
-------------------------------------------------------------------------------------

                 Key: HADOOP-2087
                 URL: https://issues.apache.org/jira/browse/HADOOP-2087
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
            Reporter: Gautam Kowshik
             Fix For: 0.15.0



task task_200710200555_0005_m_000725_0 started writing a file and the Node went down.. so all following file creation attempts were returned with AlreadyBeingCreatedException
I think the dfs should handle cases wherein, if a dfsclient goes down between file creation, subsequent creates to the same file could be allowed. 

2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_0: Task task_200710200555_0005_m_000725_0 failed to report status for 606 seconds. Killing!
2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200710200555_0005_m_000725_0' from '[tracker_address]:/127.0.0.1:44198'
2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_200710200555_0005_m_000725
2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200710200555_0005_m_000725_1' to tip tip_200710200555_0005_m_000725, for tracker '[tracker_address]:/127.0.0.1:50914'
2007-10-20 06:28:54,991 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_1: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_825 for DFSClient_task_200710200555_0005_m_000725_1 on client 72.30.50.198, because this file is already being created by DFSClient_task_200710200555_0005_m_000725_0 on 72.30.53.224
        at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:881)
        at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:806)
        at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2087) Errors for subsequent requests for file creation after original DFSClient goes down..

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537109 ] 

Raghu Angadi commented on HADOOP-2087:
--------------------------------------

> So the first question to answer is whether the retry frameworks does what it is intended to do that is retry.

Retry after one minute applies only when there is a timeout. Otherwise it retries after a few millisec. Comment from http://issues.apache.org/jira/browse/HADOOP-1263#action_12504135 :

{quote}
I am planning to use this framework for some new RPCs I am adding. I just want to confirm if my understanding is correct: This patch adds a random exponential back off timeout starting with 400 milliseconds for 5 times. In all 5 retries, this add a max of 12 seconds. Since client RPC timeout is 60sec, time it takes for such RPC to fail takes between 300-312 seconds over 6 attempts. Is this expected?, because it is not exponential back off but essentially constant timeout of around 60sec for each retry.
{quote}

> Errors for subsequent requests for file creation after original DFSClient goes down..
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2087
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Gautam Kowshik
>             Fix For: 0.15.0
>
>
> task task_200710200555_0005_m_000725_0 started writing a file and the Node went down.. so all following file creation attempts were returned with AlreadyBeingCreatedException
> I think the dfs should handle cases wherein, if a dfsclient goes down between file creation, subsequent creates to the same file could be allowed. 
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_0: Task task_200710200555_0005_m_000725_0 failed to report status for 606 seconds. Killing!
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200710200555_0005_m_000725_0' from '[tracker_address]:/127.0.0.1:44198'
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_200710200555_0005_m_000725
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200710200555_0005_m_000725_1' to tip tip_200710200555_0005_m_000725, for tracker '[tracker_address]:/127.0.0.1:50914'
> 2007-10-20 06:28:54,991 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_1: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_825 for DFSClient_task_200710200555_0005_m_000725_1 on client 72.30.50.198, because this file is already being created by DFSClient_task_200710200555_0005_m_000725_0 on 72.30.53.224
>         at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:881)
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:806)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2087) Errors for subsequent requests for file creation after original DFSClient goes down..

Posted by "Mukund Madhugiri (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537184 ] 

Mukund Madhugiri commented on HADOOP-2087:
------------------------------------------

I don't see this on every run of the TestDFSIO benchmark. The latest run that is just wrapping up does not exhibit this problem

> Errors for subsequent requests for file creation after original DFSClient goes down..
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2087
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Gautam Kowshik
>             Fix For: 0.15.0
>
>
> task task_200710200555_0005_m_000725_0 started writing a file and the Node went down.. so all following file creation attempts were returned with AlreadyBeingCreatedException
> I think the dfs should handle cases wherein, if a dfsclient goes down between file creation, subsequent creates to the same file could be allowed. 
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_0: Task task_200710200555_0005_m_000725_0 failed to report status for 606 seconds. Killing!
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200710200555_0005_m_000725_0' from '[tracker_address]:/127.0.0.1:44198'
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_200710200555_0005_m_000725
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200710200555_0005_m_000725_1' to tip tip_200710200555_0005_m_000725, for tracker '[tracker_address]:/127.0.0.1:50914'
> 2007-10-20 06:28:54,991 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_1: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_825 for DFSClient_task_200710200555_0005_m_000725_1 on client 72.30.50.198, because this file is already being created by DFSClient_task_200710200555_0005_m_000725_0 on 72.30.53.224
>         at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:881)
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:806)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2087) Errors for subsequent requests for file creation after original DFSClient goes down..

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537104 ] 

Konstantin Shvachko commented on HADOOP-2087:
---------------------------------------------

>From conversation with Hairong, DFSClient should be retrying automatically to create a file after a 1 minute sleep
if it receives AlreadyBeingCreatedException. This is a part of the retry policy introduced by HADOOP-1263.

# So the first question to answer is whether the retry frameworks does what it is intended to do that is retry.
# If it does then we should look into why the first task (_0) does not shut down properly, because that would mean
the first task is still up and renewing the lease for the file.
# And the last question is whether DFSClient SHOULD do automatic retries on AlreadyBeingCreatedException.
I think hdfs create() should throw this exception to the application, because applications might want to create a different 
file or wait for the lease to expire depending on its internal logic. In case of TestDFSIO it should wait.
Throwing should be explicit, meaning that the create api should explicitly list AlreadyBeingCreatedException as one of 
the exceptions thrown by create(), rather than just generic IOException.

Whether its critical for 0.15 or not depends on whether we can run TestDFSIO benchmark. If the exception is thrown every
time we run it, then yes.

> Errors for subsequent requests for file creation after original DFSClient goes down..
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2087
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Gautam Kowshik
>             Fix For: 0.15.0
>
>
> task task_200710200555_0005_m_000725_0 started writing a file and the Node went down.. so all following file creation attempts were returned with AlreadyBeingCreatedException
> I think the dfs should handle cases wherein, if a dfsclient goes down between file creation, subsequent creates to the same file could be allowed. 
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_0: Task task_200710200555_0005_m_000725_0 failed to report status for 606 seconds. Killing!
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200710200555_0005_m_000725_0' from '[tracker_address]:/127.0.0.1:44198'
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_200710200555_0005_m_000725
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200710200555_0005_m_000725_1' to tip tip_200710200555_0005_m_000725, for tracker '[tracker_address]:/127.0.0.1:50914'
> 2007-10-20 06:28:54,991 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_1: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_825 for DFSClient_task_200710200555_0005_m_000725_1 on client 72.30.50.198, because this file is already being created by DFSClient_task_200710200555_0005_m_000725_0 on 72.30.53.224
>         at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:881)
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:806)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (HADOOP-2087) Errors for subsequent requests for file creation after original DFSClient goes down..

Posted by Nigel Daley <nd...@yahoo-inc.com>.

Is this a blocker for 0.15?

On Oct 22, 2007, at 10:28 AM, Konstantin Shvachko (JIRA) wrote:

>
>     [ https://issues.apache.org/jira/browse/HADOOP-2087? 
> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
> tabpanel#action_12536753 ]
>
> Konstantin Shvachko commented on HADOOP-2087:
> ---------------------------------------------
>
> I think the problem here is that Task  
> task_200710200555_0005_m_000725_0 failed and was killed while it  
> was creating a file.
> Task task_200710200555_0005_m_000725_1 was started to create the  
> same file, but the lease issued for task _0 had not expired yet.
> If task _1 waited for not more than 60 seconds, the lease would  
> expire and it would be able to claim a new lease for this file.
> I think the client should retry file creation if  
> AlreadyBeingCreatedException is caught.
>
>> Errors for subsequent requests for file creation after original  
>> DFSClient goes down..
>> --------------------------------------------------------------------- 
>> ----------------
>>
>>                 Key: HADOOP-2087
>>                 URL: https://issues.apache.org/jira/browse/ 
>> HADOOP-2087
>>             Project: Hadoop
>>          Issue Type: Bug
>>          Components: dfs
>>            Reporter: Gautam Kowshik
>>             Fix For: 0.15.0
>>
>>
>> task task_200710200555_0005_m_000725_0 started writing a file and  
>> the Node went down.. so all following file creation attempts were  
>> returned with AlreadyBeingCreatedException
>> I think the dfs should handle cases wherein, if a dfsclient goes  
>> down between file creation, subsequent creates to the same file  
>> could be allowed.
>> 2007-10-20 06:23:51,189 INFO  
>> org.apache.hadoop.mapred.TaskInProgress: Error from  
>> task_200710200555_0005_m_000725_0: Task  
>> task_200710200555_0005_m_000725_0 failed to report status for 606  
>> seconds. Killing!
>> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.JobTracker:  
>> Removed completed task 'task_200710200555_0005_m_000725_0' from  
>> '[tracker_address]:/127.0.0.1:44198'
>> 2007-10-20 06:23:51,209 INFO  
>> org.apache.hadoop.mapred.JobInProgress: Choosing normal task  
>> tip_200710200555_0005_m_000725
>> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobTracker:  
>> Adding task 'task_200710200555_0005_m_000725_1' to tip  
>> tip_200710200555_0005_m_000725, for tracker '[tracker_address]:/ 
>> 127.0.0.1:50914'
>> 2007-10-20 06:28:54,991 INFO  
>> org.apache.hadoop.mapred.TaskInProgress: Error from  
>> task_200710200555_0005_m_000725_1:  
>> org.apache.hadoop.ipc.RemoteException:  
>> org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to  
>> create file /benchmarks/TestDFSIO/io_data/test_io_825 for  
>> DFSClient_task_200710200555_0005_m_000725_1 on client  
>> 72.30.50.198, because this file is already being created by  
>> DFSClient_task_200710200555_0005_m_000725_0 on 72.30.53.224
>>         at org.apache.hadoop.dfs.FSNamesystem.startFileInternal 
>> (FSNamesystem.java:881)
>>         at org.apache.hadoop.dfs.FSNamesystem.startFile 
>> (FSNamesystem.java:806)
>>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown  
>> Source)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke 
>> (DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>

[jira] Commented: (HADOOP-2087) Errors for subsequent requests for file creation after original DFSClient goes down..

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536753 ] 

Konstantin Shvachko commented on HADOOP-2087:
---------------------------------------------

I think the problem here is that Task task_200710200555_0005_m_000725_0 failed and was killed while it was creating a file.
Task task_200710200555_0005_m_000725_1 was started to create the same file, but the lease issued for task _0 had not expired yet.
If task _1 waited for not more than 60 seconds, the lease would expire and it would be able to claim a new lease for this file.
I think the client should retry file creation if AlreadyBeingCreatedException is caught.

> Errors for subsequent requests for file creation after original DFSClient goes down..
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2087
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Gautam Kowshik
>             Fix For: 0.15.0
>
>
> task task_200710200555_0005_m_000725_0 started writing a file and the Node went down.. so all following file creation attempts were returned with AlreadyBeingCreatedException
> I think the dfs should handle cases wherein, if a dfsclient goes down between file creation, subsequent creates to the same file could be allowed. 
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_0: Task task_200710200555_0005_m_000725_0 failed to report status for 606 seconds. Killing!
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200710200555_0005_m_000725_0' from '[tracker_address]:/127.0.0.1:44198'
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_200710200555_0005_m_000725
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200710200555_0005_m_000725_1' to tip tip_200710200555_0005_m_000725, for tracker '[tracker_address]:/127.0.0.1:50914'
> 2007-10-20 06:28:54,991 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_1: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_825 for DFSClient_task_200710200555_0005_m_000725_1 on client 72.30.50.198, because this file is already being created by DFSClient_task_200710200555_0005_m_000725_0 on 72.30.53.224
>         at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:881)
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:806)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2087) Errors for subsequent requests for file creation after original DFSClient goes down..

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537454 ] 

Raghu Angadi commented on HADOOP-2087:
--------------------------------------

Thats right. HADOOP-1411 seems to fix this for AlreadyBeingCreated exception.


> Errors for subsequent requests for file creation after original DFSClient goes down..
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2087
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Gautam Kowshik
>
> task task_200710200555_0005_m_000725_0 started writing a file and the Node went down.. so all following file creation attempts were returned with AlreadyBeingCreatedException
> I think the dfs should handle cases wherein, if a dfsclient goes down between file creation, subsequent creates to the same file could be allowed. 
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_0: Task task_200710200555_0005_m_000725_0 failed to report status for 606 seconds. Killing!
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200710200555_0005_m_000725_0' from '[tracker_address]:/127.0.0.1:44198'
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_200710200555_0005_m_000725
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200710200555_0005_m_000725_1' to tip tip_200710200555_0005_m_000725, for tracker '[tracker_address]:/127.0.0.1:50914'
> 2007-10-20 06:28:54,991 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_1: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_825 for DFSClient_task_200710200555_0005_m_000725_1 on client 72.30.50.198, because this file is already being created by DFSClient_task_200710200555_0005_m_000725_0 on 72.30.53.224
>         at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:881)
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:806)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2087) Errors for subsequent requests for file creation after original DFSClient goes down..

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537438 ] 

Hairong Kuang commented on HADOOP-2087:
---------------------------------------

Raghu, the client retries for 5 times with a sleeping period of 1 min in between. This was done in HADOOP-1411.

I did an experiment and it showed that the client retry did not break. I suspect that it might be the server that does not handle the lease expiration correctly.

> Errors for subsequent requests for file creation after original DFSClient goes down..
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2087
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Gautam Kowshik
>
> task task_200710200555_0005_m_000725_0 started writing a file and the Node went down.. so all following file creation attempts were returned with AlreadyBeingCreatedException
> I think the dfs should handle cases wherein, if a dfsclient goes down between file creation, subsequent creates to the same file could be allowed. 
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_0: Task task_200710200555_0005_m_000725_0 failed to report status for 606 seconds. Killing!
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200710200555_0005_m_000725_0' from '[tracker_address]:/127.0.0.1:44198'
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_200710200555_0005_m_000725
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200710200555_0005_m_000725_1' to tip tip_200710200555_0005_m_000725, for tracker '[tracker_address]:/127.0.0.1:50914'
> 2007-10-20 06:28:54,991 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_1: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_825 for DFSClient_task_200710200555_0005_m_000725_1 on client 72.30.50.198, because this file is already being created by DFSClient_task_200710200555_0005_m_000725_0 on 72.30.53.224
>         at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:881)
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:806)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2087) Errors for subsequent requests for file creation after original DFSClient goes down..

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536706 ] 

Arun C Murthy commented on HADOOP-2087:
---------------------------------------

We should just rewrite TestDFSIO (and similar tests) to use *reducer NONE* and create/write the files in ${mapred.output.dir} from the map tasks: http://wiki.apache.org/lucene-hadoop/FAQ#9.


> Errors for subsequent requests for file creation after original DFSClient goes down..
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2087
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Gautam Kowshik
>             Fix For: 0.15.0
>
>
> task task_200710200555_0005_m_000725_0 started writing a file and the Node went down.. so all following file creation attempts were returned with AlreadyBeingCreatedException
> I think the dfs should handle cases wherein, if a dfsclient goes down between file creation, subsequent creates to the same file could be allowed. 
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_0: Task task_200710200555_0005_m_000725_0 failed to report status for 606 seconds. Killing!
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200710200555_0005_m_000725_0' from '[tracker_address]:/127.0.0.1:44198'
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_200710200555_0005_m_000725
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200710200555_0005_m_000725_1' to tip tip_200710200555_0005_m_000725, for tracker '[tracker_address]:/127.0.0.1:50914'
> 2007-10-20 06:28:54,991 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_1: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_825 for DFSClient_task_200710200555_0005_m_000725_1 on client 72.30.50.198, because this file is already being created by DFSClient_task_200710200555_0005_m_000725_0 on 72.30.53.224
>         at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:881)
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:806)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2087) Errors for subsequent requests for file creation after original DFSClient goes down..

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538603 ] 

Hairong Kuang commented on HADOOP-2087:
---------------------------------------

I agree with Mukund. It seems to me that it is possible that task_0 was not cleanly killed.

> Errors for subsequent requests for file creation after original DFSClient goes down..
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2087
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Gautam Kowshik
>
> task task_200710200555_0005_m_000725_0 started writing a file and the Node went down.. so all following file creation attempts were returned with AlreadyBeingCreatedException
> I think the dfs should handle cases wherein, if a dfsclient goes down between file creation, subsequent creates to the same file could be allowed. 
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_0: Task task_200710200555_0005_m_000725_0 failed to report status for 606 seconds. Killing!
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200710200555_0005_m_000725_0' from '[tracker_address]:/127.0.0.1:44198'
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_200710200555_0005_m_000725
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200710200555_0005_m_000725_1' to tip tip_200710200555_0005_m_000725, for tracker '[tracker_address]:/127.0.0.1:50914'
> 2007-10-20 06:28:54,991 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_1: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_825 for DFSClient_task_200710200555_0005_m_000725_1 on client 72.30.50.198, because this file is already being created by DFSClient_task_200710200555_0005_m_000725_0 on 72.30.53.224
>         at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:881)
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:806)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2087) Errors for subsequent requests for file creation after original DFSClient goes down..

Posted by "Mukund Madhugiri (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mukund Madhugiri updated HADOOP-2087:
-------------------------------------

    Description: 
task task_200710200555_0005_m_000725_0 started writing a file and the Node went down.. so all following file creation attempts were returned with AlreadyBeingCreatedException
I think the dfs should handle cases wherein, if a dfsclient goes down between file creation, subsequent creates to the same file could be allowed. 

2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_0: Task task_200710200555_0005_m_000725_0 failed to report status for 606 seconds. Killing!
2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200710200555_0005_m_000725_0' from '[tracker_address]:/127.0.0.1:44198'
2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_200710200555_0005_m_000725
2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200710200555_0005_m_000725_1' to tip tip_200710200555_0005_m_000725, for tracker '[tracker_address]:/127.0.0.1:50914'
2007-10-20 06:28:54,991 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_1: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_825 for DFSClient_task_200710200555_0005_m_000725_1 on client 72.30.50.198, because this file is already being created by DFSClient_task_200710200555_0005_m_000725_0 on 72.30.53.224
        at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:881)
        at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:806)
        at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)



  was:

task task_200710200555_0005_m_000725_0 started writing a file and the Node went down.. so all following file creation attempts were returned with AlreadyBeingCreatedException
I think the dfs should handle cases wherein, if a dfsclient goes down between file creation, subsequent creates to the same file could be allowed. 

2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_0: Task task_200710200555_0005_m_000725_0 failed to report status for 606 seconds. Killing!
2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200710200555_0005_m_000725_0' from '[tracker_address]:/127.0.0.1:44198'
2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_200710200555_0005_m_000725
2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200710200555_0005_m_000725_1' to tip tip_200710200555_0005_m_000725, for tracker '[tracker_address]:/127.0.0.1:50914'
2007-10-20 06:28:54,991 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_1: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_825 for DFSClient_task_200710200555_0005_m_000725_1 on client 72.30.50.198, because this file is already being created by DFSClient_task_200710200555_0005_m_000725_0 on 72.30.53.224
        at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:881)
        at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:806)
        at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)




I went thru the logs from the latest run and here is the information from the dfsio500 run,
In the JT logs, there are _0 tasks (10 of them) that fail to report status for 600+ seconds and are killed and a respective _1 task is added. This _1 task is the one that fails to write to the file and throws the AlreadyBeingCreatedException. Seems like task _0 was not cleanly killed?

Exception log snippet from NN:
2007-10-25 06:07:52,665 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.startFile: failed to create file /benchmarks/TestDFSIO/io_data/test_io_346 for DFSClient_task_200710250537_0003_m_000246_1 on client 72.30.51.88, because this file is already being created by DFSClient_task_200710250537_0003_m_000246_0 on 72.30.53.224
2007-10-25 06:07:52,668 INFO org.apache.hadoop.ipc.Server: IPC Server handler 22 on 8020, call create(/benchmarks/TestDFSIO/io_data/test_io_346, DFSClient_task_200710250537_0003_m_000246_1, true, 3, 134217728) from 72.30.51.88:42152: error: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_346 for DFSClient_task_200710250537_0003_m_000246_1 on client 72.30.51.88, because this file is already being created by DFSClient_task_200710250537_0003_m_000246_0 on 72.30.53.224
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_346 for DFSClient_task_200710250537_0003_m_000246_1 on client 72.30.51.88, because this file is already being created by DFSClient_task_200710250537_0003_m_000246_0 on 72.30.53.224

Exception log snippet from JT:
2007-10-25 05:44:34,742 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200710250537_0003_m_000246_0' to tip tip_200710250537_0003_m_000246, for tracker 'tracker_kry2996.inktomisearch.com:/127.0.0.1:54641'
2007-10-25 06:07:51,219 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710250537_0003_m_000246_0: Task task_200710250537_0003_m_000246_0 failed to report status for 605 seconds. Killing!
2007-10-25 06:07:51,220 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200710250537_0003_m_000246_0' from 'tracker_kry2996.inktomisearch.com:/127.0.0.1:54641'
2007-10-25 06:07:51,221 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200710250537_0003_m_000246_1' to tip tip_200710250537_0003_m_000246, for tracker 'tracker_kry2778.inktomisearch.com:/127.0.0.1:41150'
2007-10-25 06:09:09,326 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200710250537_0003_m_000246_1' has completed tip_200710250537_0003_m_000246 successfully.
2007-10-25 06:13:16,060 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200710250537_0003_m_000246_1' from 'tracker_kry2778.inktomisearch.com:/127.0.0.1:41150'
2007-10-25 06:13:21,374 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200710250537_0003_m_000246_0' from 'tracker_kry2996.inktomisearch.com:/127.0.0.1:54641'

> Errors for subsequent requests for file creation after original DFSClient goes down..
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2087
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Gautam Kowshik
>
> task task_200710200555_0005_m_000725_0 started writing a file and the Node went down.. so all following file creation attempts were returned with AlreadyBeingCreatedException
> I think the dfs should handle cases wherein, if a dfsclient goes down between file creation, subsequent creates to the same file could be allowed. 
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_0: Task task_200710200555_0005_m_000725_0 failed to report status for 606 seconds. Killing!
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200710200555_0005_m_000725_0' from '[tracker_address]:/127.0.0.1:44198'
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_200710200555_0005_m_000725
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200710200555_0005_m_000725_1' to tip tip_200710200555_0005_m_000725, for tracker '[tracker_address]:/127.0.0.1:50914'
> 2007-10-20 06:28:54,991 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_1: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_825 for DFSClient_task_200710200555_0005_m_000725_1 on client 72.30.50.198, because this file is already being created by DFSClient_task_200710200555_0005_m_000725_0 on 72.30.53.224
>         at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:881)
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:806)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2087) Errors for subsequent requests for file creation after original DFSClient goes down..

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537804 ] 

Hairong Kuang commented on HADOOP-2087:
---------------------------------------

Gautam, could you please check for every successful run of TestDFSIO, was there a failed map task? I just want to check if the described problem occurs whenever there is a failed map task.

> Errors for subsequent requests for file creation after original DFSClient goes down..
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2087
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Gautam Kowshik
>
> task task_200710200555_0005_m_000725_0 started writing a file and the Node went down.. so all following file creation attempts were returned with AlreadyBeingCreatedException
> I think the dfs should handle cases wherein, if a dfsclient goes down between file creation, subsequent creates to the same file could be allowed. 
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_0: Task task_200710200555_0005_m_000725_0 failed to report status for 606 seconds. Killing!
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200710200555_0005_m_000725_0' from '[tracker_address]:/127.0.0.1:44198'
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_200710200555_0005_m_000725
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200710200555_0005_m_000725_1' to tip tip_200710200555_0005_m_000725, for tracker '[tracker_address]:/127.0.0.1:50914'
> 2007-10-20 06:28:54,991 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_1: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_825 for DFSClient_task_200710200555_0005_m_000725_1 on client 72.30.50.198, because this file is already being created by DFSClient_task_200710200555_0005_m_000725_0 on 72.30.53.224
>         at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:881)
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:806)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-2087) Errors for subsequent requests for file creation after original DFSClient goes down..

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko resolved HADOOP-2087.
-----------------------------------------

       Resolution: Duplicate
    Fix Version/s: 0.16.0
         Assignee: Konstantin Shvachko

> Errors for subsequent requests for file creation after original DFSClient goes down..
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2087
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Gautam Kowshik
>            Assignee: Konstantin Shvachko
>             Fix For: 0.16.0
>
>
> task task_200710200555_0005_m_000725_0 started writing a file and the Node went down.. so all following file creation attempts were returned with AlreadyBeingCreatedException
> I think the dfs should handle cases wherein, if a dfsclient goes down between file creation, subsequent creates to the same file could be allowed. 
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_0: Task task_200710200555_0005_m_000725_0 failed to report status for 606 seconds. Killing!
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200710200555_0005_m_000725_0' from '[tracker_address]:/127.0.0.1:44198'
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_200710200555_0005_m_000725
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200710200555_0005_m_000725_1' to tip tip_200710200555_0005_m_000725, for tracker '[tracker_address]:/127.0.0.1:50914'
> 2007-10-20 06:28:54,991 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_1: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_825 for DFSClient_task_200710200555_0005_m_000725_1 on client 72.30.50.198, because this file is already being created by DFSClient_task_200710200555_0005_m_000725_0 on 72.30.53.224
>         at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:881)
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:806)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2087) Errors for subsequent requests for file creation after original DFSClient goes down..

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536746 ] 

dhruba borthakur commented on HADOOP-2087:
------------------------------------------

I guess if we follow the convention described in http://wiki.apache.org/lucene-hadoop/FAQ#9, then the files are written to a temporary location and then moved to the final location. Does it mean that DFSIO will then have to incur the cost of the file(s) rename operation?

> Errors for subsequent requests for file creation after original DFSClient goes down..
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2087
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Gautam Kowshik
>             Fix For: 0.15.0
>
>
> task task_200710200555_0005_m_000725_0 started writing a file and the Node went down.. so all following file creation attempts were returned with AlreadyBeingCreatedException
> I think the dfs should handle cases wherein, if a dfsclient goes down between file creation, subsequent creates to the same file could be allowed. 
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_0: Task task_200710200555_0005_m_000725_0 failed to report status for 606 seconds. Killing!
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200710200555_0005_m_000725_0' from '[tracker_address]:/127.0.0.1:44198'
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_200710200555_0005_m_000725
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200710200555_0005_m_000725_1' to tip tip_200710200555_0005_m_000725, for tracker '[tracker_address]:/127.0.0.1:50914'
> 2007-10-20 06:28:54,991 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_1: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_825 for DFSClient_task_200710200555_0005_m_000725_1 on client 72.30.50.198, because this file is already being created by DFSClient_task_200710200555_0005_m_000725_0 on 72.30.53.224
>         at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:881)
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:806)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2087) Errors for subsequent requests for file creation after original DFSClient goes down..

Posted by "Gautam Kowshik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537249 ] 

Gautam Kowshik commented on HADOOP-2087:
----------------------------------------

Mukund: This bug shows up when a DFSClient goes down during a file creation. It's only when this scenario is hit does the problem show up.

> Errors for subsequent requests for file creation after original DFSClient goes down..
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2087
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Gautam Kowshik
>             Fix For: 0.15.0
>
>
> task task_200710200555_0005_m_000725_0 started writing a file and the Node went down.. so all following file creation attempts were returned with AlreadyBeingCreatedException
> I think the dfs should handle cases wherein, if a dfsclient goes down between file creation, subsequent creates to the same file could be allowed. 
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_0: Task task_200710200555_0005_m_000725_0 failed to report status for 606 seconds. Killing!
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200710200555_0005_m_000725_0' from '[tracker_address]:/127.0.0.1:44198'
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_200710200555_0005_m_000725
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200710200555_0005_m_000725_1' to tip tip_200710200555_0005_m_000725, for tracker '[tracker_address]:/127.0.0.1:50914'
> 2007-10-20 06:28:54,991 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200710200555_0005_m_000725_1: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_825 for DFSClient_task_200710200555_0005_m_000725_1 on client 72.30.50.198, because this file is already being created by DFSClient_task_200710200555_0005_m_000725_0 on 72.30.53.224
>         at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:881)
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:806)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.