You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Christian Kunz (JIRA)" <ji...@apache.org> on 2007/10/05 23:23:50 UTC

[jira] Created: (HADOOP-1999) DFSClients get stuck when running 'dfsadmin finalizeUpgrade'

DFSClients get stuck when running 'dfsadmin finalizeUpgrade'
------------------------------------------------------------

                 Key: HADOOP-1999
                 URL: https://issues.apache.org/jira/browse/HADOOP-1999
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.15.0
         Environment: Sep 14 nightly build
            Reporter: Christian Kunz


I restarted namenode with -upgrade option, started a few scripts running hadoop command line utility to upload a few files into dfs, and ran at some time

hadoop dfsadmin -finalizeUpgrade.

At this time all the dfs clients I started before got stuck during block transmission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1999) DFSClients get stuck when running 'dfsadmin finalizeUpgrade'

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Kunz updated HADOOP-1999:
-----------------------------------

    Attachment: jstack.datanode

> DFSClients get stuck when running 'dfsadmin finalizeUpgrade'
> ------------------------------------------------------------
>
>                 Key: HADOOP-1999
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1999
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0
>         Environment: Sep 14 nightly build
>            Reporter: Christian Kunz
>         Attachments: jstack.datanode
>
>
> I restarted namenode with -upgrade option, started a few scripts running hadoop command line utility to upload a few files into dfs, and ran at some time
> hadoop dfsadmin -finalizeUpgrade.
> At this time all the dfs clients I started before got stuck during block transmission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1999) DataNodes can become dead nodes when running 'dfsadmin finalizeUpgrade'

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Kunz updated HADOOP-1999:
-----------------------------------

    Priority: Major  (was: Critical)

I will retest this issue using a more recent release. 

> DataNodes can become dead nodes when running 'dfsadmin finalizeUpgrade'
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-1999
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1999
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0
>         Environment: Sep 14 nightly build
>            Reporter: Christian Kunz
>         Attachments: jstack.datanode
>
>
> I restarted namenode with -upgrade option, started a few scripts running hadoop command line utility to upload a few files into dfs, and ran at some time
> hadoop dfsadmin -finalizeUpgrade.
> At this time all the dfs clients I started before got stuck during block transmission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1999) DataNodes can become dead nodes when running 'dfsadmin finalizeUpgrade'

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533201 ] 

Konstantin Shvachko commented on HADOOP-1999:
---------------------------------------------

finalize removes hard links previously created by upgrade. The removal is done in a separate thread, but if there is a lot of blocks, 
then data-nodes are likely to be blocked on IOs, that is data transmission will be slow. This is what you observed here. 
A solution would be to remove the links lazily, e.g. remove 100 files per second or so. Then finalizing will go slower, but 
the data-nodes will be able to proceed with normal activities.

The jstack you attached: I do not see that data-node is doing any file deletes. Are you sure this thread dump was done 
during finalize? I see that one of the threads is doing DU though. Could the slowdown be related to HADOOP-1946?
Before this was fixed I've seen drastic slowdown of data-nodes, some of them would become dead even with insignificant load. 
Finalize would make things even worse.

Missing blocks: I suspect that you get these because many io operation were not complete. Some blocks were not replicated,
some files were not closed.

> DataNodes can become dead nodes when running 'dfsadmin finalizeUpgrade'
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-1999
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1999
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0
>         Environment: Sep 14 nightly build
>            Reporter: Christian Kunz
>            Priority: Critical
>         Attachments: jstack.datanode
>
>
> I restarted namenode with -upgrade option, started a few scripts running hadoop command line utility to upload a few files into dfs, and ran at some time
> hadoop dfsadmin -finalizeUpgrade.
> At this time all the dfs clients I started before got stuck during block transmission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1999) DataNodes can become dead nodes when running 'dfsadmin finalizeUpgrade'

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Kunz updated HADOOP-1999:
-----------------------------------

    Priority: Critical  (was: Major)
     Summary: DataNodes can become dead nodes when running 'dfsadmin finalizeUpgrade'  (was: DFSClients get stuck when running 'dfsadmin finalizeUpgrade')

The situation was worse than initially reported. Some DataNodes became dead nodes about at the same time, and some blocks became missing because of that. I attached the jstack of one of the dead DataNodes.

Concerning missing blocks, It looks to me that the block id goes missing as well (at least fsck does not report it anymore). If this is true, I would argue that this should be fixed, because it can always happen that some DataNodes go off-line at the same time because of bugs or network problems, and could be brought on-line just by restarting.

> DataNodes can become dead nodes when running 'dfsadmin finalizeUpgrade'
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-1999
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1999
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0
>         Environment: Sep 14 nightly build
>            Reporter: Christian Kunz
>            Priority: Critical
>         Attachments: jstack.datanode
>
>
> I restarted namenode with -upgrade option, started a few scripts running hadoop command line utility to upload a few files into dfs, and ran at some time
> hadoop dfsadmin -finalizeUpgrade.
> At this time all the dfs clients I started before got stuck during block transmission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1999) DataNodes can become dead nodes when running 'dfsadmin finalizeUpgrade'

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533578 ] 

Christian Kunz commented on HADOOP-1999:
----------------------------------------

Konstantin, I got he jstack much later after the -finalizeUpgrade, and the datanode was declared a dead node by the nameserver a long time before.
But the issue might be fixed by HADOOP-1946.

> DataNodes can become dead nodes when running 'dfsadmin finalizeUpgrade'
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-1999
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1999
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0
>         Environment: Sep 14 nightly build
>            Reporter: Christian Kunz
>            Priority: Critical
>         Attachments: jstack.datanode
>
>
> I restarted namenode with -upgrade option, started a few scripts running hadoop command line utility to upload a few files into dfs, and ran at some time
> hadoop dfsadmin -finalizeUpgrade.
> At this time all the dfs clients I started before got stuck during block transmission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.