You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2008/05/02 19:58:55 UTC

[jira] Created: (HADOOP-3339) DFS Write pipeline does not detect defective datanode correctly if it times out.

DFS Write pipeline does not detect defective datanode correctly if it times out.
--------------------------------------------------------------------------------

                 Key: HADOOP-3339
                 URL: https://issues.apache.org/jira/browse/HADOOP-3339
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.16.0
            Reporter: Raghu Angadi
            Assignee: Raghu Angadi



When DFSClient is writing to DFS, it does not correctly detect the culprit datanode (rather datanodes do not inform) properly. Say, the last datanode in in 3 node pipeline is is too slow or defective. In this case, pipeline removes the first two datanodes in first two attempts. The third attempt has only the 3rd datanode in the pipeline and it will fail too. If the pipeline detects the bad 3rd node when the first failure occurs, the write will succeed in the second attempt. 

I will attach example logs of such cases. I think this should be fixed in 0.17.x.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3339) DFS Write pipeline does not detect defective datanode correctly if it times out.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593908#action_12593908 ] 

Raghu Angadi commented on HADOOP-3339:
--------------------------------------

TestDatanodeDeath tests killing different datanodes in the pipeline and it works. The main difference is that whether the downstream datanode's error is detected depends on which of the two threads (the main data receiver or the "PacketResonder") detects it. 

In TestDatanodeDeath, its always the PacketResponder that detects it. But when a downstream datanode timeouts (or when the connection is busy) its the main IO thread that detects it. The fix I am thinking of is to make main thread inform the 'PacketResponder' about the failure.


> DFS Write pipeline does not detect defective datanode correctly if it times out.
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-3339
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3339
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>
> When DFSClient is writing to DFS, it does not correctly detect the culprit datanode (rather datanodes do not inform) properly if the bad node times out. Say, the last datanode in in 3 node pipeline is is too slow or defective. In this case, pipeline removes the first two datanodes in first two attempts. The third attempt has only the 3rd datanode in the pipeline and it will fail too. If the pipeline detects the bad 3rd node when the first failure occurs, the write will succeed in the second attempt. 
> I will attach example logs of such cases. I think this should be fixed in 0.17.x.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3339) DFS Write pipeline does not detect defective datanode correctly if it times out.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-3339:
-------------------------------------------

    Hadoop Flags: [Reviewed]

+1 patch looks good.

> DFS Write pipeline does not detect defective datanode correctly if it times out.
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-3339
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3339
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.18.0
>
>         Attachments: HADOOP-3339.patch, tmp-3339-dn.patch
>
>
> When DFSClient is writing to DFS, it does not correctly detect the culprit datanode (rather datanodes do not inform) properly if the bad node times out. Say, the last datanode in in 3 node pipeline is is too slow or defective. In this case, pipeline removes the first two datanodes in first two attempts. The third attempt has only the 3rd datanode in the pipeline and it will fail too. If the pipeline detects the bad 3rd node when the first failure occurs, the write will succeed in the second attempt. 
> I will attach example logs of such cases. I think this should be fixed in 0.17.x.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3339) DFS Write pipeline does not detect defective datanode correctly if it times out.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-3339:
---------------------------------

    Attachment: HADOOP-3339.patch

The attached patch fixes the main problem described in the description. This handles the case when 3rd datanode (or 4th etc) fails properly. Regd failure at the 2nd datanode, it needs a fix at DFSClient and I don't have fix for it yet. I will file another jira for that.


> DFS Write pipeline does not detect defective datanode correctly if it times out.
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-3339
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3339
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.18.0
>
>         Attachments: HADOOP-3339.patch, tmp-3339-dn.patch
>
>
> When DFSClient is writing to DFS, it does not correctly detect the culprit datanode (rather datanodes do not inform) properly if the bad node times out. Say, the last datanode in in 3 node pipeline is is too slow or defective. In this case, pipeline removes the first two datanodes in first two attempts. The third attempt has only the 3rd datanode in the pipeline and it will fail too. If the pipeline detects the bad 3rd node when the first failure occurs, the write will succeed in the second attempt. 
> I will attach example logs of such cases. I think this should be fixed in 0.17.x.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3339) DFS Write pipeline does not detect defective datanode correctly if it times out.

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-3339:
------------------------------------

    Release Note: 
Improved failure handling of last Data Node in write pipeline.


  was:
Some of the failures on 3rd datanode in DFS write pipelie are not detected properly. This could lead to hard failure of client's write operation.



> DFS Write pipeline does not detect defective datanode correctly if it times out.
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-3339
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3339
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.18.0
>
>         Attachments: HADOOP-3339.patch, tmp-3339-dn.patch
>
>
> When DFSClient is writing to DFS, it does not correctly detect the culprit datanode (rather datanodes do not inform) properly if the bad node times out. Say, the last datanode in in 3 node pipeline is is too slow or defective. In this case, pipeline removes the first two datanodes in first two attempts. The third attempt has only the 3rd datanode in the pipeline and it will fail too. If the pipeline detects the bad 3rd node when the first failure occurs, the write will succeed in the second attempt. 
> I will attach example logs of such cases. I think this should be fixed in 0.17.x.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3339) DFS Write pipeline does not detect defective datanode correctly if it times out.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-3339:
---------------------------------

    Attachment: tmp-3339-dn.patch


The attached patch fixes the main problem described (practically all the time). It informs upstream properly about the the down stream failure.

Similar problem exists on client side as well. So if 2nd datanode timesout, most of the time client removes the first datanode as the bad one. The issues on DataNode and Client are similar but similar fix can not work, because on DataNode the responder needs properly write its state upstream and Client needs to properly read all the remaining data on the socket from first datanode.

The main issue is that BlockReceiver thread (and DataStreamer in the case of DFSClient) {{interrupt()}} the 'responder' thread. But interrupting is a pretty coarse control. We don't know what state the responder is in and interrupting has different effects depending on responder state. To fix this properly we need to redesign how we handle these interactions.

I am trying out a fix for DFSClient.

> DFS Write pipeline does not detect defective datanode correctly if it times out.
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-3339
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3339
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.18.0
>
>         Attachments: tmp-3339-dn.patch
>
>
> When DFSClient is writing to DFS, it does not correctly detect the culprit datanode (rather datanodes do not inform) properly if the bad node times out. Say, the last datanode in in 3 node pipeline is is too slow or defective. In this case, pipeline removes the first two datanodes in first two attempts. The third attempt has only the 3rd datanode in the pipeline and it will fail too. If the pipeline detects the bad 3rd node when the first failure occurs, the write will succeed in the second attempt. 
> I will attach example logs of such cases. I think this should be fixed in 0.17.x.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3339) DFS Write pipeline does not detect defective datanode correctly if it times out.

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nigel Daley updated HADOOP-3339:
--------------------------------

    Fix Version/s: 0.18.0

Assigning to 0.18.  This isn't a blocker for 0.17

> DFS Write pipeline does not detect defective datanode correctly if it times out.
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-3339
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3339
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.18.0
>
>
> When DFSClient is writing to DFS, it does not correctly detect the culprit datanode (rather datanodes do not inform) properly if the bad node times out. Say, the last datanode in in 3 node pipeline is is too slow or defective. In this case, pipeline removes the first two datanodes in first two attempts. The third attempt has only the 3rd datanode in the pipeline and it will fail too. If the pipeline detects the bad 3rd node when the first failure occurs, the write will succeed in the second attempt. 
> I will attach example logs of such cases. I think this should be fixed in 0.17.x.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3339) DFS Write pipeline does not detect defective datanode correctly if it times out.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-3339:
---------------------------------

    Status: Patch Available  (was: Open)

Thanks Nicholas.

> DFS Write pipeline does not detect defective datanode correctly if it times out.
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-3339
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3339
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.18.0
>
>         Attachments: HADOOP-3339.patch, tmp-3339-dn.patch
>
>
> When DFSClient is writing to DFS, it does not correctly detect the culprit datanode (rather datanodes do not inform) properly if the bad node times out. Say, the last datanode in in 3 node pipeline is is too slow or defective. In this case, pipeline removes the first two datanodes in first two attempts. The third attempt has only the 3rd datanode in the pipeline and it will fail too. If the pipeline detects the bad 3rd node when the first failure occurs, the write will succeed in the second attempt. 
> I will attach example logs of such cases. I think this should be fixed in 0.17.x.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3339) DFS Write pipeline does not detect defective datanode correctly if it times out.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-3339:
---------------------------------

      Resolution: Fixed
    Release Note: 
Some of the failures on 3rd datanode in DFS write pipelie are not detected properly. This could lead to hard failure of client's write operation.

          Status: Resolved  (was: Patch Available)

I just committed this.

> DFS Write pipeline does not detect defective datanode correctly if it times out.
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-3339
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3339
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.18.0
>
>         Attachments: HADOOP-3339.patch, tmp-3339-dn.patch
>
>
> When DFSClient is writing to DFS, it does not correctly detect the culprit datanode (rather datanodes do not inform) properly if the bad node times out. Say, the last datanode in in 3 node pipeline is is too slow or defective. In this case, pipeline removes the first two datanodes in first two attempts. The third attempt has only the 3rd datanode in the pipeline and it will fail too. If the pipeline detects the bad 3rd node when the first failure occurs, the write will succeed in the second attempt. 
> I will attach example logs of such cases. I think this should be fixed in 0.17.x.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3339) DFS Write pipeline does not detect defective datanode correctly if it times out.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12598036#action_12598036 ] 

Raghu Angadi commented on HADOOP-3339:
--------------------------------------

The test failure is another case of HADOOP-3354 and is not related to this patch. Also HADOOP-3416 is filed regd DFSClient.

> DFS Write pipeline does not detect defective datanode correctly if it times out.
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-3339
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3339
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.18.0
>
>         Attachments: HADOOP-3339.patch, tmp-3339-dn.patch
>
>
> When DFSClient is writing to DFS, it does not correctly detect the culprit datanode (rather datanodes do not inform) properly if the bad node times out. Say, the last datanode in in 3 node pipeline is is too slow or defective. In this case, pipeline removes the first two datanodes in first two attempts. The third attempt has only the 3rd datanode in the pipeline and it will fail too. If the pipeline detects the bad 3rd node when the first failure occurs, the write will succeed in the second attempt. 
> I will attach example logs of such cases. I think this should be fixed in 0.17.x.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3339) DFS Write pipeline does not detect defective datanode correctly if it times out.

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597702#action_12597702 ] 

Hadoop QA commented on HADOOP-3339:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12382218/HADOOP-3339.patch
  against trunk revision 656939.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2496/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2496/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2496/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2496/console

This message is automatically generated.

> DFS Write pipeline does not detect defective datanode correctly if it times out.
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-3339
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3339
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.18.0
>
>         Attachments: HADOOP-3339.patch, tmp-3339-dn.patch
>
>
> When DFSClient is writing to DFS, it does not correctly detect the culprit datanode (rather datanodes do not inform) properly if the bad node times out. Say, the last datanode in in 3 node pipeline is is too slow or defective. In this case, pipeline removes the first two datanodes in first two attempts. The third attempt has only the 3rd datanode in the pipeline and it will fail too. If the pipeline detects the bad 3rd node when the first failure occurs, the write will succeed in the second attempt. 
> I will attach example logs of such cases. I think this should be fixed in 0.17.x.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3339) DFS Write pipeline does not detect defective datanode correctly if it times out.

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594497#action_12594497 ] 

dhruba borthakur commented on HADOOP-3339:
------------------------------------------

+1 on Raghu's proposal. 

> DFS Write pipeline does not detect defective datanode correctly if it times out.
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-3339
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3339
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.18.0
>
>
> When DFSClient is writing to DFS, it does not correctly detect the culprit datanode (rather datanodes do not inform) properly if the bad node times out. Say, the last datanode in in 3 node pipeline is is too slow or defective. In this case, pipeline removes the first two datanodes in first two attempts. The third attempt has only the 3rd datanode in the pipeline and it will fail too. If the pipeline detects the bad 3rd node when the first failure occurs, the write will succeed in the second attempt. 
> I will attach example logs of such cases. I think this should be fixed in 0.17.x.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3339) DFS Write pipeline does not detect defective datanode correctly if it times out.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-3339:
---------------------------------

    Description: 
When DFSClient is writing to DFS, it does not correctly detect the culprit datanode (rather datanodes do not inform) properly if the bad node times out. Say, the last datanode in in 3 node pipeline is is too slow or defective. In this case, pipeline removes the first two datanodes in first two attempts. The third attempt has only the 3rd datanode in the pipeline and it will fail too. If the pipeline detects the bad 3rd node when the first failure occurs, the write will succeed in the second attempt. 

I will attach example logs of such cases. I think this should be fixed in 0.17.x.


  was:

When DFSClient is writing to DFS, it does not correctly detect the culprit datanode (rather datanodes do not inform) properly. Say, the last datanode in in 3 node pipeline is is too slow or defective. In this case, pipeline removes the first two datanodes in first two attempts. The third attempt has only the 3rd datanode in the pipeline and it will fail too. If the pipeline detects the bad 3rd node when the first failure occurs, the write will succeed in the second attempt. 

I will attach example logs of such cases. I think this should be fixed in 0.17.x.



> DFS Write pipeline does not detect defective datanode correctly if it times out.
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-3339
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3339
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>
> When DFSClient is writing to DFS, it does not correctly detect the culprit datanode (rather datanodes do not inform) properly if the bad node times out. Say, the last datanode in in 3 node pipeline is is too slow or defective. In this case, pipeline removes the first two datanodes in first two attempts. The third attempt has only the 3rd datanode in the pipeline and it will fail too. If the pipeline detects the bad 3rd node when the first failure occurs, the write will succeed in the second attempt. 
> I will attach example logs of such cases. I think this should be fixed in 0.17.x.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3339) DFS Write pipeline does not detect defective datanode correctly if it times out.

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12598995#action_12598995 ] 

Hudson commented on HADOOP-3339:
--------------------------------

Integrated in Hadoop-trunk #499 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/499/])

> DFS Write pipeline does not detect defective datanode correctly if it times out.
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-3339
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3339
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.18.0
>
>         Attachments: HADOOP-3339.patch, tmp-3339-dn.patch
>
>
> When DFSClient is writing to DFS, it does not correctly detect the culprit datanode (rather datanodes do not inform) properly if the bad node times out. Say, the last datanode in in 3 node pipeline is is too slow or defective. In this case, pipeline removes the first two datanodes in first two attempts. The third attempt has only the 3rd datanode in the pipeline and it will fail too. If the pipeline detects the bad 3rd node when the first failure occurs, the write will succeed in the second attempt. 
> I will attach example logs of such cases. I think this should be fixed in 0.17.x.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.