You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Konstantin Shvachko (JIRA)" <ji...@apache.org> on 2007/10/08 23:06:50 UTC

[jira] Commented: (HADOOP-1999) DataNodes can become dead nodes when running 'dfsadmin finalizeUpgrade'

    [ https://issues.apache.org/jira/browse/HADOOP-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533201 ] 

Konstantin Shvachko commented on HADOOP-1999:
---------------------------------------------

finalize removes hard links previously created by upgrade. The removal is done in a separate thread, but if there is a lot of blocks, 
then data-nodes are likely to be blocked on IOs, that is data transmission will be slow. This is what you observed here. 
A solution would be to remove the links lazily, e.g. remove 100 files per second or so. Then finalizing will go slower, but 
the data-nodes will be able to proceed with normal activities.

The jstack you attached: I do not see that data-node is doing any file deletes. Are you sure this thread dump was done 
during finalize? I see that one of the threads is doing DU though. Could the slowdown be related to HADOOP-1946?
Before this was fixed I've seen drastic slowdown of data-nodes, some of them would become dead even with insignificant load. 
Finalize would make things even worse.

Missing blocks: I suspect that you get these because many io operation were not complete. Some blocks were not replicated,
some files were not closed.

> DataNodes can become dead nodes when running 'dfsadmin finalizeUpgrade'
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-1999
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1999
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0
>         Environment: Sep 14 nightly build
>            Reporter: Christian Kunz
>            Priority: Critical
>         Attachments: jstack.datanode
>
>
> I restarted namenode with -upgrade option, started a few scripts running hadoop command line utility to upload a few files into dfs, and ran at some time
> hadoop dfsadmin -finalizeUpgrade.
> At this time all the dfs clients I started before got stuck during block transmission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.