You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Christian Kunz (JIRA)" <ji...@apache.org> on 2007/07/30 16:54:53 UTC

[jira] Created: (HADOOP-1664) Hadoop DFS upgrade prcoedure

Hadoop DFS upgrade prcoedure
----------------------------

                 Key: HADOOP-1664
                 URL: https://issues.apache.org/jira/browse/HADOOP-1664
             Project: Hadoop
          Issue Type: Improvement
          Components: dfs
    Affects Versions: 0.14.0
            Reporter: Christian Kunz


When upgrading from a July-9  to a July-25 nightly release, we are able to upgrade successfully on a single-node cluster, but failed on a 10 and a 200 node cluster.
As it is not sure whether we made a mistake or not, I file this as an improvement. But going forward it is imperative that there is a safe and well-documented procedure to upgrade dfs without loss of data, including a rollback procedure and listing of operational procedures that are irreversibly destructive (hopefully an empty list).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-1664) Hadoop DFS upgrade prcoedure

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi resolved HADOOP-1664.
----------------------------------

    Resolution: Cannot Reproduce

We have never seen this behavior again. 0.14.x has gone through many upgrades, large and small. 

> Hadoop DFS upgrade prcoedure
> ----------------------------
>
>                 Key: HADOOP-1664
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1664
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Christian Kunz
>         Attachments: datanode.log.txt
>
>
> When upgrading from a July-9  to a July-25 nightly release, we are able to upgrade successfully on a single-node cluster, but failed on a 10 and a 200 node cluster.
> As it is not sure whether we made a mistake or not, I file this as an improvement. But going forward it is imperative that there is a safe and well-documented procedure to upgrade dfs without loss of data, including a rollback procedure and listing of operational procedures that are irreversibly destructive (hopefully an empty list).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1664) Hadoop DFS upgrade prcoedure

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516525 ] 

Christian Kunz commented on HADOOP-1664:
----------------------------------------

I will send you the location offline. BTW: I tried the dfsadmin command 'upgradeProgress' during that time, reporting 0.0% progress.

> Hadoop DFS upgrade prcoedure
> ----------------------------
>
>                 Key: HADOOP-1664
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1664
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Christian Kunz
>
> When upgrading from a July-9  to a July-25 nightly release, we are able to upgrade successfully on a single-node cluster, but failed on a 10 and a 200 node cluster.
> As it is not sure whether we made a mistake or not, I file this as an improvement. But going forward it is imperative that there is a safe and well-documented procedure to upgrade dfs without loss of data, including a rollback procedure and listing of operational procedures that are irreversibly destructive (hopefully an empty list).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1664) Hadoop DFS upgrade prcoedure

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516431 ] 

Raghu Angadi commented on HADOOP-1664:
--------------------------------------

I am writing an admin guide for upgrading to Hadoop-0.14. will post it in couple of days. If you have any logs, please add them here. Upgrade and rollback procedure is same as before.

> Hadoop DFS upgrade prcoedure
> ----------------------------
>
>                 Key: HADOOP-1664
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1664
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Christian Kunz
>
> When upgrading from a July-9  to a July-25 nightly release, we are able to upgrade successfully on a single-node cluster, but failed on a 10 and a 200 node cluster.
> As it is not sure whether we made a mistake or not, I file this as an improvement. But going forward it is imperative that there is a safe and well-documented procedure to upgrade dfs without loss of data, including a rollback procedure and listing of operational procedures that are irreversibly destructive (hopefully an empty list).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1664) Hadoop DFS upgrade prcoedure

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516488 ] 

Christian Kunz commented on HADOOP-1664:
----------------------------------------

Datanode servers were apparently successful in upgrading:
...
2007-07-26 10:35:34,973 INFO org.apache.hadoop.dfs.DataNode:
   Distributed upgrade for DataNode version -6 to current LV -7 is initialized.
2007-07-26 10:35:34,974 INFO org.apache.hadoop.dfs.Storage: Upgrading storage directory <hadoop-dir>/dfs/data.
   old LV = -5; old CTime = 1183153812398.
   new LV = -7; new CTime = 1185471333047
2007-07-26 10:36:58,098 INFO org.apache.hadoop.dfs.Storage: Upgrade of /<hadoop-dir>/dfs/data is complete.
2007-07-26 10:36:58,587 INFO org.apache.hadoop.dfs.DataNode: Opened server at 50010
...

but namenode server reported 0% upgrade long after that:

2007-07-26 10:43:04,818 INFO org.apache.hadoop.dfs.BlockCrcUpgradeNamenode: Upgrade still running.
                                 Avg completion on Datanodes: 0.00% with 0 errors.

Even after 40 minutes no change in report status, namenode was still in safe mode, and if I wanted to force it to leave safe mode, it refused:

hadoop dfsadmin -safemode leave
safemode: org.apache.hadoop.dfs.SafeModeException: Distributed upgrade is in progress. Name node is in safe mode.



> Hadoop DFS upgrade prcoedure
> ----------------------------
>
>                 Key: HADOOP-1664
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1664
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Christian Kunz
>
> When upgrading from a July-9  to a July-25 nightly release, we are able to upgrade successfully on a single-node cluster, but failed on a 10 and a 200 node cluster.
> As it is not sure whether we made a mistake or not, I file this as an improvement. But going forward it is imperative that there is a safe and well-documented procedure to upgrade dfs without loss of data, including a rollback procedure and listing of operational procedures that are irreversibly destructive (hopefully an empty list).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1664) Hadoop DFS upgrade prcoedure

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-1664:
---------------------------------

    Attachment: datanode.log.txt

Namenode log looks fine. It starts the CRC upgrade and is waiting for datanodes to start the same and join. But for some reason, datanodes don't start the CRC upgrade. I am not sure what was going on. If you ever able to reproduce this, please let me know. 

I am attaching relevant part of one of the datanode's log.


> Hadoop DFS upgrade prcoedure
> ----------------------------
>
>                 Key: HADOOP-1664
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1664
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Christian Kunz
>         Attachments: datanode.log.txt
>
>
> When upgrading from a July-9  to a July-25 nightly release, we are able to upgrade successfully on a single-node cluster, but failed on a 10 and a 200 node cluster.
> As it is not sure whether we made a mistake or not, I file this as an improvement. But going forward it is imperative that there is a safe and well-documented procedure to upgrade dfs without loss of data, including a rollback procedure and listing of operational procedures that are irreversibly destructive (hopefully an empty list).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1664) Hadoop DFS upgrade prcoedure

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516493 ] 

Raghu Angadi commented on HADOOP-1664:
--------------------------------------

If possible, I would like to look at full log of a datanode and the namenode. There is new dfsadmin command 'upgradeProgress'.

> Hadoop DFS upgrade prcoedure
> ----------------------------
>
>                 Key: HADOOP-1664
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1664
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Christian Kunz
>
> When upgrading from a July-9  to a July-25 nightly release, we are able to upgrade successfully on a single-node cluster, but failed on a 10 and a 200 node cluster.
> As it is not sure whether we made a mistake or not, I file this as an improvement. But going forward it is imperative that there is a safe and well-documented procedure to upgrade dfs without loss of data, including a rollback procedure and listing of operational procedures that are irreversibly destructive (hopefully an empty list).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.