You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Nigel Daley (JIRA)" <ji...@apache.org> on 2007/05/22 06:58:16 UTC

[jira] Created: (HADOOP-1411) AlreadyBeingCreatedException from task retries

AlreadyBeingCreatedException from task retries
----------------------------------------------

                 Key: HADOOP-1411
                 URL: https://issues.apache.org/jira/browse/HADOOP-1411
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.13.0
            Reporter: Nigel Daley
         Assigned To: dhruba borthakur
            Priority: Blocker
             Fix For: 0.13.0


HADOOP-1407 indicates 2 bugs: a mapred bug which will be fixed as part of 1407, and a DFSClient bug that will be fixed here.

Note that the test run in 1407 was without speculation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1411) AlreadyBeingCreatedException from task retries

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-1411:
------------------------------

    Attachment: createRetry-tw2.patch

New patch with unit test.

> AlreadyBeingCreatedException from task retries
> ----------------------------------------------
>
>                 Key: HADOOP-1411
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1411
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.0
>            Reporter: Nigel Daley
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: createRetry-tw.patch, createRetry-tw2.patch, createRetry.patch, createRetry1.patch
>
>
> HADOOP-1407 indicates 2 bugs: a mapred bug which will be fixed as part of 1407, and a DFSClient bug that will be fixed here.
> Note that the test run in 1407 was without speculation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1411) AlreadyBeingCreatedException from task retries

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499302 ] 

Hadoop QA commented on HADOOP-1411:
-----------------------------------

Integrated in Hadoop-Nightly #101 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/101/)

> AlreadyBeingCreatedException from task retries
> ----------------------------------------------
>
>                 Key: HADOOP-1411
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1411
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.0
>            Reporter: Nigel Daley
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: createRetry-tw.patch, createRetry-tw2.patch, createRetry.patch, createRetry1.patch
>
>
> HADOOP-1407 indicates 2 bugs: a mapred bug which will be fixed as part of 1407, and a DFSClient bug that will be fixed here.
> Note that the test run in 1407 was without speculation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1411) AlreadyBeingCreatedException from task retries

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498666 ] 

Tom White commented on HADOOP-1411:
-----------------------------------

Hairong,

+0

While the patch will solve the problem, I think if possible it would be better not to have a special case for ipc.RemoteException embedded in RetryPolicies.retryByException. Instead we could have another static factory, RetryPolicies.retryByRemoteException say, that should be used for remote interfaces. Also, note that the signature for this doesn't need to have a Map keyed by String: keying by Exception provides more type safety (at the slight cost of creating a new Map keyed by String for internal use) and consistency with RetryPolicies.retryByException.

What do you think?

> AlreadyBeingCreatedException from task retries
> ----------------------------------------------
>
>                 Key: HADOOP-1411
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1411
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.0
>            Reporter: Nigel Daley
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: createRetry.patch
>
>
> HADOOP-1407 indicates 2 bugs: a mapred bug which will be fixed as part of 1407, and a DFSClient bug that will be fixed here.
> Note that the test run in 1407 was without speculation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1411) AlreadyBeingCreatedException from task retries

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499217 ] 

Owen O'Malley commented on HADOOP-1411:
---------------------------------------

I think the underlying source of the problem is that the Map parameter is too big. I think it would be better if the handlers were done more explicitly as:
{code}
  addHandler(Class<? extends Exception>, RetryHandler);
{code}
But I think this patch should go in since it fixes the problem and we can address any further concerns later.

> AlreadyBeingCreatedException from task retries
> ----------------------------------------------
>
>                 Key: HADOOP-1411
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1411
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.0
>            Reporter: Nigel Daley
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: createRetry-tw.patch, createRetry-tw2.patch, createRetry.patch, createRetry1.patch
>
>
> HADOOP-1407 indicates 2 bugs: a mapred bug which will be fixed as part of 1407, and a DFSClient bug that will be fixed here.
> Note that the test run in 1407 was without speculation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1411) AlreadyBeingCreatedException from task retries

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-1411:
------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I've just committed this.  Thanks Hairong!

> AlreadyBeingCreatedException from task retries
> ----------------------------------------------
>
>                 Key: HADOOP-1411
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1411
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.0
>            Reporter: Nigel Daley
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: createRetry-tw.patch, createRetry-tw2.patch, createRetry.patch, createRetry1.patch
>
>
> HADOOP-1407 indicates 2 bugs: a mapred bug which will be fixed as part of 1407, and a DFSClient bug that will be fixed here.
> Note that the test run in 1407 was without speculation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1411) AlreadyBeingCreatedException from task retries

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-1411:
------------------------------

    Status: Patch Available  (was: Open)

> AlreadyBeingCreatedException from task retries
> ----------------------------------------------
>
>                 Key: HADOOP-1411
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1411
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.0
>            Reporter: Nigel Daley
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: createRetry-tw.patch, createRetry-tw2.patch, createRetry.patch, createRetry1.patch
>
>
> HADOOP-1407 indicates 2 bugs: a mapred bug which will be fixed as part of 1407, and a DFSClient bug that will be fixed here.
> Note that the test run in 1407 was without speculation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1411) AlreadyBeingCreatedException from task retries

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-1411:
----------------------------------

    Attachment: createRetry.patch

Tom, could you please review the patch to see if you are happy with my changes to the io retry framework?

> AlreadyBeingCreatedException from task retries
> ----------------------------------------------
>
>                 Key: HADOOP-1411
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1411
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.0
>            Reporter: Nigel Daley
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: createRetry.patch
>
>
> HADOOP-1407 indicates 2 bugs: a mapred bug which will be fixed as part of 1407, and a DFSClient bug that will be fixed here.
> Note that the test run in 1407 was without speculation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1411) AlreadyBeingCreatedException from task retries

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-1411:
------------------------------

    Attachment: createRetry-tw.patch

>I agree this interface is cleaner. But one disadvantage is that it is harder for a client to use the framework to support RemoteException. A client needs to build two exception2Policy maps and shouldRetry needs to be recusively called one more time to search one more map.

It is slightly harder for clients to set up, but I would argue it is clearer. I wouldn't worry about the cost of an extra call to shouldRetry since it will be dwarfed by the time between retries.

>I agree that keying by Exception provides more type safety. But again calling Class.forName is more costly than calling Class.getName. That's why I perfer usign class name as the map's key. Alternatively we could enforce type safety by providing an "add" handler to the map with Exception as a parameter and internally implements the map using the class name as the key. But we could do it in a different jira.

Agreed. I was imagining turning the Map<Exception, RetryPolicy> to a Map<String, RetryPolicy> on construction of RemoteExceptionDependentRetry rather then using Class.getName. I've attached a patch which does this. Does this work OK for you?

Finally, it would be nice to have a test for this new behaviour.


> AlreadyBeingCreatedException from task retries
> ----------------------------------------------
>
>                 Key: HADOOP-1411
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1411
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.0
>            Reporter: Nigel Daley
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: createRetry-tw.patch, createRetry.patch, createRetry1.patch
>
>
> HADOOP-1407 indicates 2 bugs: a mapred bug which will be fixed as part of 1407, and a DFSClient bug that will be fixed here.
> Note that the test run in 1407 was without speculation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1411) AlreadyBeingCreatedException from task retries

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-1411:
----------------------------------

    Attachment: createRetry1.patch

The new patch reflects Tom's suggestions.

> I think if possible it would be better not to have a special case for ipc.RemoteException embedded in RetryPolicies.retryByException. Instead we could have another static factory, RetryPolicies.retryByRemoteException say, that should be used for remote interfaces.

I agree this interface is cleaner. But one disadvantage is that it is harder for a client to use the framework to support RemoteException. A client needs to build two exception2Policy maps and shouldRetry needs to be recusively called one more time to search one more map.

> Also, note that the signature for this doesn't need to have a Map keyed by String: keying by Exception provides more type safety (at the slight cost of creating a new Map keyed by String for internal use) and consistency with RetryPolicies.retryByException.

I agree that keying by Exception provides more type safety. But again calling Class.forName is more costly than calling Class.getName. That's why I perfer usign class name as the map's key. Alternatively we could enforce type safety by providing an "add" handler to the map with Exception as a parameter and internally implements the map using the class name as the key. But we could do it in a different jira.


> AlreadyBeingCreatedException from task retries
> ----------------------------------------------
>
>                 Key: HADOOP-1411
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1411
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.0
>            Reporter: Nigel Daley
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: createRetry.patch, createRetry1.patch
>
>
> HADOOP-1407 indicates 2 bugs: a mapred bug which will be fixed as part of 1407, and a DFSClient bug that will be fixed here.
> Note that the test run in 1407 was without speculation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1411) AlreadyBeingCreatedException from task retries

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499232 ] 

Hadoop QA commented on HADOOP-1411:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12358232/createRetry-tw2.patch applied and successfully tested against trunk revision r541754.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/205/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/205/console

> AlreadyBeingCreatedException from task retries
> ----------------------------------------------
>
>                 Key: HADOOP-1411
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1411
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.0
>            Reporter: Nigel Daley
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: createRetry-tw.patch, createRetry-tw2.patch, createRetry.patch, createRetry1.patch
>
>
> HADOOP-1407 indicates 2 bugs: a mapred bug which will be fixed as part of 1407, and a DFSClient bug that will be fixed here.
> Note that the test run in 1407 was without speculation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1411) AlreadyBeingCreatedException from task retries

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498073 ] 

Hairong Kuang commented on HADOOP-1411:
---------------------------------------

I did put some logic in the patch to HADOOP-1263 to handle retries for AlreadyBeingCreatedException. But it turned out it did not work. The problem is that any IPC call returns only RemoteException. When a server operation throws AlreadyBeingCreatedException, the client only gets RemoteException and it has to examine the content of RemoteException to find out the real exception. So the DFSClient retry framework implemented in HADOOP-1263 never catches an AlreadyBeingCreatedException and therefore it never gets retried.

I'd like to propose the following changes to the general retry framework so it is able to handle RemoteException well:
Method shouldRetry of RetryPolicies.exceptionDependentRetry checks if the exception is a RemoteException. If yes, find out the retry policy for the real exception from the exceptionToPolicyMap. Because RemoteException contains only the class name of the real exception, I would also propose to change the exceptionToPolicyMap to map an exception class name to a retry policy. Currently it is a map from a exception class to a retry policy. 

> AlreadyBeingCreatedException from task retries
> ----------------------------------------------
>
>                 Key: HADOOP-1411
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1411
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.0
>            Reporter: Nigel Daley
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>
> HADOOP-1407 indicates 2 bugs: a mapred bug which will be fixed as part of 1407, and a DFSClient bug that will be fixed here.
> Note that the test run in 1407 was without speculation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-1411) AlreadyBeingCreatedException from task retries

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nigel Daley reassigned HADOOP-1411:
-----------------------------------

    Assignee: Hairong Kuang  (was: dhruba borthakur)

> AlreadyBeingCreatedException from task retries
> ----------------------------------------------
>
>                 Key: HADOOP-1411
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1411
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.0
>            Reporter: Nigel Daley
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>
> HADOOP-1407 indicates 2 bugs: a mapred bug which will be fixed as part of 1407, and a DFSClient bug that will be fixed here.
> Note that the test run in 1407 was without speculation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.