You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Musty Rehmani <mu...@yahoo.com.INVALID> on 2016/03/24 21:32:33 UTC

Re: Upgrading production

Keep the meta data backup before upgrade. Preferably on local machine. Do not finalize upgrade until you are OK with data availability 
Musty 

Sent from Yahoo Mail on Android 
 
  On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash<ra...@gmail.com> wrote:   Hi Chathuri!

Technically there is a rollback option during upgrade. I don't know how well it has been tested, but the idea is that old metadata is not deleted until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm fairly confident that the HDFS upgrade will work smoothly. We have upgraded quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never having to roll back). Its your applications that work on top of HDFS and YARN that I'd be concerned about.

HTH
Ravi

On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <ka...@gmail.com> wrote:

Thanks for information Ravi. Is there a way that I can back up data before the  update ? I was thinking about this approach..
Copy the current hadoop directories to a new set of directories.Point hadoop to this new setStart the migration with the backup set
Please let me know if people have done this upgrade successfully. I believe many things can go wrong in a lengthy upgrade like this. The data in the cluster is very important. Thanks,Chathuri
On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com> wrote:

Hi Chathuri!
   
   - When we upgrade, does it change the namenode data structures and data nodes? I assume it only changes the name node...

It changes the NN as well as DN layout. As a matter of fact, this upgrade will take a long time on Datanodes as well because of https://issues.apache.org/jira/browse/HDFS-6482

   
   - What are the risks with this upgrade ?    


What Hadoop applications do you run on top of your cluster? The hope is that everything continues working smoothly for the most part, but inevitably some backward incompatible changes creep in. 

   
   - Is there a place where I can review the changes made to file system from 2.5.1 to 2.7.2?

The release notes. http://hadoop.apache.org/releases.html .You'd have to accumulate all the changes in the versions. 


Practically, I'd try to run my application on your upgraded test cluster.

HTH


Ravi


On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <ka...@gmail.com> wrote:

Hi, 
We have a hadoop production deployment with 1 name node and 10 data nodes which has more than 20TB of data in HDFS. We are currently using Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2. 
I followed the following link (https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html) and updated a single node system running in pseudo distributed mode and it went without any issues. But this system did not have that much data as the production system. 
Since this is a production system, I'm reluctant to do this update. I would like to see what other people have done in these cases and their experiences... Here are few questions I have..   
   - When we upgrade, does it change the namenode data structures and data nodes? I assume it only changes the name node...
   - What are the risks with this upgrade ? 
   - Is there a place where I can review the changes made to file system from 2.5.1 to 2.7.2?
I would really appreciate if you can share your experiences.
Thanks in advance,Chathuri