You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Chathuri Wimalasena <ka...@gmail.com> on 2016/03/23 20:17:41 UTC

Upgrading production hadoop system from 2.5.1 to 2.7.2

Hi,

We have a hadoop production deployment with 1 name node and 10 data nodes
which has more than 20TB of data in HDFS. We are currently using Hadoop
2.5.1 and we want to update it to latest Hadoop version, 2.7.2.

I followed the following link (
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
and updated a single node system running in pseudo distributed mode and it
went without any issues. But this system did not have that much data as the
production system.

Since this is a production system, I'm reluctant to do this update. I would
like to see what other people have done in these cases and their
experiences... Here are few questions I have..

   - When we upgrade, does it change the namenode data structures and data
   nodes? I assume it only changes the name node...
   - What are the risks with this upgrade ?
   - Is there a place where I can review the changes made to file system
   from 2.5.1 to 2.7.2?

I would really appreciate if you can share your experiences.

Thanks in advance,
Chathuri

Re: Upgrading production

Posted by Musty Rehmani <mu...@yahoo.com.INVALID>.
Keep the meta data backup before upgrade. Preferably on local machine. Do not finalize upgrade until you are OK with data availability 
Musty 

Sent from Yahoo Mail on Android 
 
  On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash<ra...@gmail.com> wrote:   Hi Chathuri!

Technically there is a rollback option during upgrade. I don't know how well it has been tested, but the idea is that old metadata is not deleted until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm fairly confident that the HDFS upgrade will work smoothly. We have upgraded quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never having to roll back). Its your applications that work on top of HDFS and YARN that I'd be concerned about.

HTH
Ravi

On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <ka...@gmail.com> wrote:

Thanks for information Ravi. Is there a way that I can back up data before the  update ? I was thinking about this approach..
Copy the current hadoop directories to a new set of directories.Point hadoop to this new setStart the migration with the backup set
Please let me know if people have done this upgrade successfully. I believe many things can go wrong in a lengthy upgrade like this. The data in the cluster is very important. Thanks,Chathuri
On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com> wrote:

Hi Chathuri!
   
   - When we upgrade, does it change the namenode data structures and data nodes? I assume it only changes the name node...

It changes the NN as well as DN layout. As a matter of fact, this upgrade will take a long time on Datanodes as well because of https://issues.apache.org/jira/browse/HDFS-6482

   
   - What are the risks with this upgrade ?    


What Hadoop applications do you run on top of your cluster? The hope is that everything continues working smoothly for the most part, but inevitably some backward incompatible changes creep in. 

   
   - Is there a place where I can review the changes made to file system from 2.5.1 to 2.7.2?

The release notes. http://hadoop.apache.org/releases.html .You'd have to accumulate all the changes in the versions. 


Practically, I'd try to run my application on your upgraded test cluster.

HTH


Ravi


On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <ka...@gmail.com> wrote:

Hi, 
We have a hadoop production deployment with 1 name node and 10 data nodes which has more than 20TB of data in HDFS. We are currently using Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2. 
I followed the following link (https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html) and updated a single node system running in pseudo distributed mode and it went without any issues. But this system did not have that much data as the production system. 
Since this is a production system, I'm reluctant to do this update. I would like to see what other people have done in these cases and their experiences... Here are few questions I have..   
   - When we upgrade, does it change the namenode data structures and data nodes? I assume it only changes the name node...
   - What are the risks with this upgrade ? 
   - Is there a place where I can review the changes made to file system from 2.5.1 to 2.7.2?
I would really appreciate if you can share your experiences.
Thanks in advance,Chathuri





  

Re: Upgrading production

Posted by Musty Rehmani <mu...@yahoo.com.INVALID>.
Keep the meta data backup before upgrade. Preferably on local machine. Do not finalize upgrade until you are OK with data availability 
Musty 

Sent from Yahoo Mail on Android 
 
  On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash<ra...@gmail.com> wrote:   Hi Chathuri!

Technically there is a rollback option during upgrade. I don't know how well it has been tested, but the idea is that old metadata is not deleted until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm fairly confident that the HDFS upgrade will work smoothly. We have upgraded quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never having to roll back). Its your applications that work on top of HDFS and YARN that I'd be concerned about.

HTH
Ravi

On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <ka...@gmail.com> wrote:

Thanks for information Ravi. Is there a way that I can back up data before the  update ? I was thinking about this approach..
Copy the current hadoop directories to a new set of directories.Point hadoop to this new setStart the migration with the backup set
Please let me know if people have done this upgrade successfully. I believe many things can go wrong in a lengthy upgrade like this. The data in the cluster is very important. Thanks,Chathuri
On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com> wrote:

Hi Chathuri!
   
   - When we upgrade, does it change the namenode data structures and data nodes? I assume it only changes the name node...

It changes the NN as well as DN layout. As a matter of fact, this upgrade will take a long time on Datanodes as well because of https://issues.apache.org/jira/browse/HDFS-6482

   
   - What are the risks with this upgrade ?    


What Hadoop applications do you run on top of your cluster? The hope is that everything continues working smoothly for the most part, but inevitably some backward incompatible changes creep in. 

   
   - Is there a place where I can review the changes made to file system from 2.5.1 to 2.7.2?

The release notes. http://hadoop.apache.org/releases.html .You'd have to accumulate all the changes in the versions. 


Practically, I'd try to run my application on your upgraded test cluster.

HTH


Ravi


On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <ka...@gmail.com> wrote:

Hi, 
We have a hadoop production deployment with 1 name node and 10 data nodes which has more than 20TB of data in HDFS. We are currently using Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2. 
I followed the following link (https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html) and updated a single node system running in pseudo distributed mode and it went without any issues. But this system did not have that much data as the production system. 
Since this is a production system, I'm reluctant to do this update. I would like to see what other people have done in these cases and their experiences... Here are few questions I have..   
   - When we upgrade, does it change the namenode data structures and data nodes? I assume it only changes the name node...
   - What are the risks with this upgrade ? 
   - Is there a place where I can review the changes made to file system from 2.5.1 to 2.7.2?
I would really appreciate if you can share your experiences.
Thanks in advance,Chathuri





  

Re: Upgrading production

Posted by Musty Rehmani <mu...@yahoo.com.INVALID>.
Keep the meta data backup before upgrade. Preferably on local machine. Do not finalize upgrade until you are OK with data availability 
Musty 

Sent from Yahoo Mail on Android 
 
  On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash<ra...@gmail.com> wrote:   Hi Chathuri!

Technically there is a rollback option during upgrade. I don't know how well it has been tested, but the idea is that old metadata is not deleted until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm fairly confident that the HDFS upgrade will work smoothly. We have upgraded quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never having to roll back). Its your applications that work on top of HDFS and YARN that I'd be concerned about.

HTH
Ravi

On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <ka...@gmail.com> wrote:

Thanks for information Ravi. Is there a way that I can back up data before the  update ? I was thinking about this approach..
Copy the current hadoop directories to a new set of directories.Point hadoop to this new setStart the migration with the backup set
Please let me know if people have done this upgrade successfully. I believe many things can go wrong in a lengthy upgrade like this. The data in the cluster is very important. Thanks,Chathuri
On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com> wrote:

Hi Chathuri!
   
   - When we upgrade, does it change the namenode data structures and data nodes? I assume it only changes the name node...

It changes the NN as well as DN layout. As a matter of fact, this upgrade will take a long time on Datanodes as well because of https://issues.apache.org/jira/browse/HDFS-6482

   
   - What are the risks with this upgrade ?    


What Hadoop applications do you run on top of your cluster? The hope is that everything continues working smoothly for the most part, but inevitably some backward incompatible changes creep in. 

   
   - Is there a place where I can review the changes made to file system from 2.5.1 to 2.7.2?

The release notes. http://hadoop.apache.org/releases.html .You'd have to accumulate all the changes in the versions. 


Practically, I'd try to run my application on your upgraded test cluster.

HTH


Ravi


On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <ka...@gmail.com> wrote:

Hi, 
We have a hadoop production deployment with 1 name node and 10 data nodes which has more than 20TB of data in HDFS. We are currently using Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2. 
I followed the following link (https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html) and updated a single node system running in pseudo distributed mode and it went without any issues. But this system did not have that much data as the production system. 
Since this is a production system, I'm reluctant to do this update. I would like to see what other people have done in these cases and their experiences... Here are few questions I have..   
   - When we upgrade, does it change the namenode data structures and data nodes? I assume it only changes the name node...
   - What are the risks with this upgrade ? 
   - Is there a place where I can review the changes made to file system from 2.5.1 to 2.7.2?
I would really appreciate if you can share your experiences.
Thanks in advance,Chathuri





  

Re: Upgrading production

Posted by Musty Rehmani <mu...@yahoo.com.INVALID>.
Keep the meta data backup before upgrade. Preferably on local machine. Do not finalize upgrade until you are OK with data availability 
Musty 

Sent from Yahoo Mail on Android 
 
  On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash<ra...@gmail.com> wrote:   Hi Chathuri!

Technically there is a rollback option during upgrade. I don't know how well it has been tested, but the idea is that old metadata is not deleted until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm fairly confident that the HDFS upgrade will work smoothly. We have upgraded quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never having to roll back). Its your applications that work on top of HDFS and YARN that I'd be concerned about.

HTH
Ravi

On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <ka...@gmail.com> wrote:

Thanks for information Ravi. Is there a way that I can back up data before the  update ? I was thinking about this approach..
Copy the current hadoop directories to a new set of directories.Point hadoop to this new setStart the migration with the backup set
Please let me know if people have done this upgrade successfully. I believe many things can go wrong in a lengthy upgrade like this. The data in the cluster is very important. Thanks,Chathuri
On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com> wrote:

Hi Chathuri!
   
   - When we upgrade, does it change the namenode data structures and data nodes? I assume it only changes the name node...

It changes the NN as well as DN layout. As a matter of fact, this upgrade will take a long time on Datanodes as well because of https://issues.apache.org/jira/browse/HDFS-6482

   
   - What are the risks with this upgrade ?    


What Hadoop applications do you run on top of your cluster? The hope is that everything continues working smoothly for the most part, but inevitably some backward incompatible changes creep in. 

   
   - Is there a place where I can review the changes made to file system from 2.5.1 to 2.7.2?

The release notes. http://hadoop.apache.org/releases.html .You'd have to accumulate all the changes in the versions. 


Practically, I'd try to run my application on your upgraded test cluster.

HTH


Ravi


On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <ka...@gmail.com> wrote:

Hi, 
We have a hadoop production deployment with 1 name node and 10 data nodes which has more than 20TB of data in HDFS. We are currently using Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2. 
I followed the following link (https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html) and updated a single node system running in pseudo distributed mode and it went without any issues. But this system did not have that much data as the production system. 
Since this is a production system, I'm reluctant to do this update. I would like to see what other people have done in these cases and their experiences... Here are few questions I have..   
   - When we upgrade, does it change the namenode data structures and data nodes? I assume it only changes the name node...
   - What are the risks with this upgrade ? 
   - Is there a place where I can review the changes made to file system from 2.5.1 to 2.7.2?
I would really appreciate if you can share your experiences.
Thanks in advance,Chathuri





  

unsubscribe

Posted by Marco Reis <ma...@marcoreis.net>.
On Thu, Mar 24, 2016 at 5:16 AM Chathuri Wimalasena <ka...@gmail.com>
wrote:

> Hi Ravi,
>
> Thank you for all the information, Our application is indexing twitter
> data to HBase and then do some data analytics on top of that. That's why
> HDFS data is very important to us. We cannot tolerate any data loss with
> the update. Do you remember how long it took for you to upgrade it from
> 2.4.1 to 2.7.1 ?
>
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> Technically there is a rollback option during upgrade. I don't know how
>> well it has been tested, but the idea is that old metadata is not deleted
>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>> having to roll back). Its your applications that work on top of HDFS and
>> YARN that I'd be concerned about.
>>
>> HTH
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Thanks for information Ravi. Is there a way that I can back up data
>>> before the  update ? I was thinking about this approach..
>>>
>>> Copy the current hadoop directories to a new set of directories.
>>> Point hadoop to this new set
>>> Start the migration with the backup set
>>>
>>> Please let me know if people have done this upgrade successfully. I
>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>> in the cluster is very important.
>>> Thanks,
>>> Chathuri
>>>
>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi Chathuri!
>>>>
>>>>    - When we upgrade, does it change the namenode data structures and
>>>>    data nodes? I assume it only changes the name node...
>>>>
>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>> upgrade will take a long time on Datanodes as well because of
>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>
>>>>    - What are the risks with this upgrade ?
>>>>
>>>> What Hadoop applications do you run on top of your cluster? The hope is
>>>> that everything continues working smoothly for the most part, but
>>>> inevitably some backward incompatible changes creep in.
>>>>
>>>>    - Is there a place where I can review the changes made to file
>>>>    system from 2.5.1 to 2.7.2?
>>>>
>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>> to accumulate all the changes in the versions.
>>>>
>>>> Practically, I'd try to run my application on your upgraded test
>>>> cluster.
>>>>
>>>> HTH
>>>>
>>>> Ravi
>>>>
>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>> kamalasini@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>
>>>>> I followed the following link (
>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>> went without any issues. But this system did not have that much data as the
>>>>> production system.
>>>>>
>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>> would like to see what other people have done in these cases and their
>>>>> experiences... Here are few questions I have..
>>>>>
>>>>>    - When we upgrade, does it change the namenode data structures and
>>>>>    data nodes? I assume it only changes the name node...
>>>>>    - What are the risks with this upgrade ?
>>>>>    - Is there a place where I can review the changes made to file
>>>>>    system from 2.5.1 to 2.7.2?
>>>>>
>>>>> I would really appreciate if you can share your experiences.
>>>>>
>>>>> Thanks in advance,
>>>>> Chathuri
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Chathuri Wimalasena <ka...@gmail.com>.
Hi Ravi,

We have 10 data nodes. Each data node has 12 disks mounted and each data
node contains nearly 20 TB.

Filesystem            1K-blocks       Used  Available Use% Mounted on
/dev/md0              113183272   12830044   94597128  12% /
tmpfs                  66061772          0   66061772   0% /dev/shm
/dev/sdc1            3905108984 1847318400 2057790584  48% /data/sda
/dev/sdd1            3905108984 1766808072 2138300912  46% /data/sdb
/dev/sde1            3905108984 1762628972 2142480012  46% /data/sdc
/dev/sdf1            3905108984 1762803256 2142305728  46% /data/sdd
/dev/sdg1            3905108984 1757301724 2147807260  46% /data/sde
/dev/sdh1            3905108984 1764210768 2140898216  46% /data/sdf
/dev/sdi1            3905108984 1754803788 2150305196  45% /data/sdg
/dev/sdj1            3905108984 1753740904 2151368080  45% /data/sdh
/dev/sdk1            3905108984 1758186416 2146922568  46% /data/sdi
/dev/sdl1            3905108984 1757352332 2147756652  46% /data/sdj
/dev/sdm1            3905108984 1759121952 2145987032  46% /data/sdk
/dev/sdn1            3905108984 2991279120  913829864  77% /data/sdl
10.10.2.54:/home      113183744   55836672   51591168  52% /home
10.10.2.54:/vol/home1
                      976283648   93448192  882835456  10% /vol/home1
10.10.2.54:/vol/home2
                      976284672  412706816  563577856  43% /vol/home2
10.10.2.54:/vol/home3
                      976284672   51256320  925028352   6% /vol/home3

Thanks,
Chathuri

On Thu, Mar 24, 2016 at 2:45 PM, Ravi Prakash <ra...@gmail.com> wrote:

> Hi Chathuri!
>
> You're welcome! We did not have an HBase instance to upgrade. It depends
> on how many blocks your datanodes are storing (== how big your disks are *
> how many disks you have * how full your disks are). What are those numbers
> for you? We experienced anywhere from 1-3 hours for the upgrade.
>
> HTH
> Ravi
>
> On Thu, Mar 24, 2016 at 1:16 AM, Chathuri Wimalasena <kamalasini@gmail.com
> > wrote:
>
>> Hi Ravi,
>>
>> Thank you for all the information, Our application is indexing twitter
>> data to HBase and then do some data analytics on top of that. That's why
>> HDFS data is very important to us. We cannot tolerate any data loss with
>> the update. Do you remember how long it took for you to upgrade it from
>> 2.4.1 to 2.7.1 ?
>>
>> Thanks,
>> Chathuri
>>
>> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
>> wrote:
>>
>>> Hi Chathuri!
>>>
>>> Technically there is a rollback option during upgrade. I don't know how
>>> well it has been tested, but the idea is that old metadata is not deleted
>>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>>> having to roll back). Its your applications that work on top of HDFS and
>>> YARN that I'd be concerned about.
>>>
>>> HTH
>>> Ravi
>>>
>>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>>> kamalasini@gmail.com> wrote:
>>>
>>>> Thanks for information Ravi. Is there a way that I can back up data
>>>> before the  update ? I was thinking about this approach..
>>>>
>>>> Copy the current hadoop directories to a new set of directories.
>>>> Point hadoop to this new set
>>>> Start the migration with the backup set
>>>>
>>>> Please let me know if people have done this upgrade successfully. I
>>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>>> in the cluster is very important.
>>>> Thanks,
>>>> Chathuri
>>>>
>>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Chathuri!
>>>>>
>>>>>    - When we upgrade, does it change the namenode data structures and
>>>>>    data nodes? I assume it only changes the name node...
>>>>>
>>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>>> upgrade will take a long time on Datanodes as well because of
>>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>>
>>>>>    - What are the risks with this upgrade ?
>>>>>
>>>>> What Hadoop applications do you run on top of your cluster? The hope
>>>>> is that everything continues working smoothly for the most part, but
>>>>> inevitably some backward incompatible changes creep in.
>>>>>
>>>>>    - Is there a place where I can review the changes made to file
>>>>>    system from 2.5.1 to 2.7.2?
>>>>>
>>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>>> to accumulate all the changes in the versions.
>>>>>
>>>>> Practically, I'd try to run my application on your upgraded test
>>>>> cluster.
>>>>>
>>>>> HTH
>>>>>
>>>>> Ravi
>>>>>
>>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>>> kamalasini@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>>
>>>>>> I followed the following link (
>>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>>> went without any issues. But this system did not have that much data as the
>>>>>> production system.
>>>>>>
>>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>>> would like to see what other people have done in these cases and their
>>>>>> experiences... Here are few questions I have..
>>>>>>
>>>>>>    - When we upgrade, does it change the namenode data structures
>>>>>>    and data nodes? I assume it only changes the name node...
>>>>>>    - What are the risks with this upgrade ?
>>>>>>    - Is there a place where I can review the changes made to file
>>>>>>    system from 2.5.1 to 2.7.2?
>>>>>>
>>>>>> I would really appreciate if you can share your experiences.
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Chathuri
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Chathuri Wimalasena <ka...@gmail.com>.
Hi Ravi,

We have 10 data nodes. Each data node has 12 disks mounted and each data
node contains nearly 20 TB.

Filesystem            1K-blocks       Used  Available Use% Mounted on
/dev/md0              113183272   12830044   94597128  12% /
tmpfs                  66061772          0   66061772   0% /dev/shm
/dev/sdc1            3905108984 1847318400 2057790584  48% /data/sda
/dev/sdd1            3905108984 1766808072 2138300912  46% /data/sdb
/dev/sde1            3905108984 1762628972 2142480012  46% /data/sdc
/dev/sdf1            3905108984 1762803256 2142305728  46% /data/sdd
/dev/sdg1            3905108984 1757301724 2147807260  46% /data/sde
/dev/sdh1            3905108984 1764210768 2140898216  46% /data/sdf
/dev/sdi1            3905108984 1754803788 2150305196  45% /data/sdg
/dev/sdj1            3905108984 1753740904 2151368080  45% /data/sdh
/dev/sdk1            3905108984 1758186416 2146922568  46% /data/sdi
/dev/sdl1            3905108984 1757352332 2147756652  46% /data/sdj
/dev/sdm1            3905108984 1759121952 2145987032  46% /data/sdk
/dev/sdn1            3905108984 2991279120  913829864  77% /data/sdl
10.10.2.54:/home      113183744   55836672   51591168  52% /home
10.10.2.54:/vol/home1
                      976283648   93448192  882835456  10% /vol/home1
10.10.2.54:/vol/home2
                      976284672  412706816  563577856  43% /vol/home2
10.10.2.54:/vol/home3
                      976284672   51256320  925028352   6% /vol/home3

Thanks,
Chathuri

On Thu, Mar 24, 2016 at 2:45 PM, Ravi Prakash <ra...@gmail.com> wrote:

> Hi Chathuri!
>
> You're welcome! We did not have an HBase instance to upgrade. It depends
> on how many blocks your datanodes are storing (== how big your disks are *
> how many disks you have * how full your disks are). What are those numbers
> for you? We experienced anywhere from 1-3 hours for the upgrade.
>
> HTH
> Ravi
>
> On Thu, Mar 24, 2016 at 1:16 AM, Chathuri Wimalasena <kamalasini@gmail.com
> > wrote:
>
>> Hi Ravi,
>>
>> Thank you for all the information, Our application is indexing twitter
>> data to HBase and then do some data analytics on top of that. That's why
>> HDFS data is very important to us. We cannot tolerate any data loss with
>> the update. Do you remember how long it took for you to upgrade it from
>> 2.4.1 to 2.7.1 ?
>>
>> Thanks,
>> Chathuri
>>
>> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
>> wrote:
>>
>>> Hi Chathuri!
>>>
>>> Technically there is a rollback option during upgrade. I don't know how
>>> well it has been tested, but the idea is that old metadata is not deleted
>>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>>> having to roll back). Its your applications that work on top of HDFS and
>>> YARN that I'd be concerned about.
>>>
>>> HTH
>>> Ravi
>>>
>>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>>> kamalasini@gmail.com> wrote:
>>>
>>>> Thanks for information Ravi. Is there a way that I can back up data
>>>> before the  update ? I was thinking about this approach..
>>>>
>>>> Copy the current hadoop directories to a new set of directories.
>>>> Point hadoop to this new set
>>>> Start the migration with the backup set
>>>>
>>>> Please let me know if people have done this upgrade successfully. I
>>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>>> in the cluster is very important.
>>>> Thanks,
>>>> Chathuri
>>>>
>>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Chathuri!
>>>>>
>>>>>    - When we upgrade, does it change the namenode data structures and
>>>>>    data nodes? I assume it only changes the name node...
>>>>>
>>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>>> upgrade will take a long time on Datanodes as well because of
>>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>>
>>>>>    - What are the risks with this upgrade ?
>>>>>
>>>>> What Hadoop applications do you run on top of your cluster? The hope
>>>>> is that everything continues working smoothly for the most part, but
>>>>> inevitably some backward incompatible changes creep in.
>>>>>
>>>>>    - Is there a place where I can review the changes made to file
>>>>>    system from 2.5.1 to 2.7.2?
>>>>>
>>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>>> to accumulate all the changes in the versions.
>>>>>
>>>>> Practically, I'd try to run my application on your upgraded test
>>>>> cluster.
>>>>>
>>>>> HTH
>>>>>
>>>>> Ravi
>>>>>
>>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>>> kamalasini@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>>
>>>>>> I followed the following link (
>>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>>> went without any issues. But this system did not have that much data as the
>>>>>> production system.
>>>>>>
>>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>>> would like to see what other people have done in these cases and their
>>>>>> experiences... Here are few questions I have..
>>>>>>
>>>>>>    - When we upgrade, does it change the namenode data structures
>>>>>>    and data nodes? I assume it only changes the name node...
>>>>>>    - What are the risks with this upgrade ?
>>>>>>    - Is there a place where I can review the changes made to file
>>>>>>    system from 2.5.1 to 2.7.2?
>>>>>>
>>>>>> I would really appreciate if you can share your experiences.
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Chathuri
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Chathuri Wimalasena <ka...@gmail.com>.
Hi Ravi,

We have 10 data nodes. Each data node has 12 disks mounted and each data
node contains nearly 20 TB.

Filesystem            1K-blocks       Used  Available Use% Mounted on
/dev/md0              113183272   12830044   94597128  12% /
tmpfs                  66061772          0   66061772   0% /dev/shm
/dev/sdc1            3905108984 1847318400 2057790584  48% /data/sda
/dev/sdd1            3905108984 1766808072 2138300912  46% /data/sdb
/dev/sde1            3905108984 1762628972 2142480012  46% /data/sdc
/dev/sdf1            3905108984 1762803256 2142305728  46% /data/sdd
/dev/sdg1            3905108984 1757301724 2147807260  46% /data/sde
/dev/sdh1            3905108984 1764210768 2140898216  46% /data/sdf
/dev/sdi1            3905108984 1754803788 2150305196  45% /data/sdg
/dev/sdj1            3905108984 1753740904 2151368080  45% /data/sdh
/dev/sdk1            3905108984 1758186416 2146922568  46% /data/sdi
/dev/sdl1            3905108984 1757352332 2147756652  46% /data/sdj
/dev/sdm1            3905108984 1759121952 2145987032  46% /data/sdk
/dev/sdn1            3905108984 2991279120  913829864  77% /data/sdl
10.10.2.54:/home      113183744   55836672   51591168  52% /home
10.10.2.54:/vol/home1
                      976283648   93448192  882835456  10% /vol/home1
10.10.2.54:/vol/home2
                      976284672  412706816  563577856  43% /vol/home2
10.10.2.54:/vol/home3
                      976284672   51256320  925028352   6% /vol/home3

Thanks,
Chathuri

On Thu, Mar 24, 2016 at 2:45 PM, Ravi Prakash <ra...@gmail.com> wrote:

> Hi Chathuri!
>
> You're welcome! We did not have an HBase instance to upgrade. It depends
> on how many blocks your datanodes are storing (== how big your disks are *
> how many disks you have * how full your disks are). What are those numbers
> for you? We experienced anywhere from 1-3 hours for the upgrade.
>
> HTH
> Ravi
>
> On Thu, Mar 24, 2016 at 1:16 AM, Chathuri Wimalasena <kamalasini@gmail.com
> > wrote:
>
>> Hi Ravi,
>>
>> Thank you for all the information, Our application is indexing twitter
>> data to HBase and then do some data analytics on top of that. That's why
>> HDFS data is very important to us. We cannot tolerate any data loss with
>> the update. Do you remember how long it took for you to upgrade it from
>> 2.4.1 to 2.7.1 ?
>>
>> Thanks,
>> Chathuri
>>
>> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
>> wrote:
>>
>>> Hi Chathuri!
>>>
>>> Technically there is a rollback option during upgrade. I don't know how
>>> well it has been tested, but the idea is that old metadata is not deleted
>>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>>> having to roll back). Its your applications that work on top of HDFS and
>>> YARN that I'd be concerned about.
>>>
>>> HTH
>>> Ravi
>>>
>>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>>> kamalasini@gmail.com> wrote:
>>>
>>>> Thanks for information Ravi. Is there a way that I can back up data
>>>> before the  update ? I was thinking about this approach..
>>>>
>>>> Copy the current hadoop directories to a new set of directories.
>>>> Point hadoop to this new set
>>>> Start the migration with the backup set
>>>>
>>>> Please let me know if people have done this upgrade successfully. I
>>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>>> in the cluster is very important.
>>>> Thanks,
>>>> Chathuri
>>>>
>>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Chathuri!
>>>>>
>>>>>    - When we upgrade, does it change the namenode data structures and
>>>>>    data nodes? I assume it only changes the name node...
>>>>>
>>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>>> upgrade will take a long time on Datanodes as well because of
>>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>>
>>>>>    - What are the risks with this upgrade ?
>>>>>
>>>>> What Hadoop applications do you run on top of your cluster? The hope
>>>>> is that everything continues working smoothly for the most part, but
>>>>> inevitably some backward incompatible changes creep in.
>>>>>
>>>>>    - Is there a place where I can review the changes made to file
>>>>>    system from 2.5.1 to 2.7.2?
>>>>>
>>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>>> to accumulate all the changes in the versions.
>>>>>
>>>>> Practically, I'd try to run my application on your upgraded test
>>>>> cluster.
>>>>>
>>>>> HTH
>>>>>
>>>>> Ravi
>>>>>
>>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>>> kamalasini@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>>
>>>>>> I followed the following link (
>>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>>> went without any issues. But this system did not have that much data as the
>>>>>> production system.
>>>>>>
>>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>>> would like to see what other people have done in these cases and their
>>>>>> experiences... Here are few questions I have..
>>>>>>
>>>>>>    - When we upgrade, does it change the namenode data structures
>>>>>>    and data nodes? I assume it only changes the name node...
>>>>>>    - What are the risks with this upgrade ?
>>>>>>    - Is there a place where I can review the changes made to file
>>>>>>    system from 2.5.1 to 2.7.2?
>>>>>>
>>>>>> I would really appreciate if you can share your experiences.
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Chathuri
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Chathuri Wimalasena <ka...@gmail.com>.
Hi Ravi,

We have 10 data nodes. Each data node has 12 disks mounted and each data
node contains nearly 20 TB.

Filesystem            1K-blocks       Used  Available Use% Mounted on
/dev/md0              113183272   12830044   94597128  12% /
tmpfs                  66061772          0   66061772   0% /dev/shm
/dev/sdc1            3905108984 1847318400 2057790584  48% /data/sda
/dev/sdd1            3905108984 1766808072 2138300912  46% /data/sdb
/dev/sde1            3905108984 1762628972 2142480012  46% /data/sdc
/dev/sdf1            3905108984 1762803256 2142305728  46% /data/sdd
/dev/sdg1            3905108984 1757301724 2147807260  46% /data/sde
/dev/sdh1            3905108984 1764210768 2140898216  46% /data/sdf
/dev/sdi1            3905108984 1754803788 2150305196  45% /data/sdg
/dev/sdj1            3905108984 1753740904 2151368080  45% /data/sdh
/dev/sdk1            3905108984 1758186416 2146922568  46% /data/sdi
/dev/sdl1            3905108984 1757352332 2147756652  46% /data/sdj
/dev/sdm1            3905108984 1759121952 2145987032  46% /data/sdk
/dev/sdn1            3905108984 2991279120  913829864  77% /data/sdl
10.10.2.54:/home      113183744   55836672   51591168  52% /home
10.10.2.54:/vol/home1
                      976283648   93448192  882835456  10% /vol/home1
10.10.2.54:/vol/home2
                      976284672  412706816  563577856  43% /vol/home2
10.10.2.54:/vol/home3
                      976284672   51256320  925028352   6% /vol/home3

Thanks,
Chathuri

On Thu, Mar 24, 2016 at 2:45 PM, Ravi Prakash <ra...@gmail.com> wrote:

> Hi Chathuri!
>
> You're welcome! We did not have an HBase instance to upgrade. It depends
> on how many blocks your datanodes are storing (== how big your disks are *
> how many disks you have * how full your disks are). What are those numbers
> for you? We experienced anywhere from 1-3 hours for the upgrade.
>
> HTH
> Ravi
>
> On Thu, Mar 24, 2016 at 1:16 AM, Chathuri Wimalasena <kamalasini@gmail.com
> > wrote:
>
>> Hi Ravi,
>>
>> Thank you for all the information, Our application is indexing twitter
>> data to HBase and then do some data analytics on top of that. That's why
>> HDFS data is very important to us. We cannot tolerate any data loss with
>> the update. Do you remember how long it took for you to upgrade it from
>> 2.4.1 to 2.7.1 ?
>>
>> Thanks,
>> Chathuri
>>
>> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
>> wrote:
>>
>>> Hi Chathuri!
>>>
>>> Technically there is a rollback option during upgrade. I don't know how
>>> well it has been tested, but the idea is that old metadata is not deleted
>>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>>> having to roll back). Its your applications that work on top of HDFS and
>>> YARN that I'd be concerned about.
>>>
>>> HTH
>>> Ravi
>>>
>>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>>> kamalasini@gmail.com> wrote:
>>>
>>>> Thanks for information Ravi. Is there a way that I can back up data
>>>> before the  update ? I was thinking about this approach..
>>>>
>>>> Copy the current hadoop directories to a new set of directories.
>>>> Point hadoop to this new set
>>>> Start the migration with the backup set
>>>>
>>>> Please let me know if people have done this upgrade successfully. I
>>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>>> in the cluster is very important.
>>>> Thanks,
>>>> Chathuri
>>>>
>>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Chathuri!
>>>>>
>>>>>    - When we upgrade, does it change the namenode data structures and
>>>>>    data nodes? I assume it only changes the name node...
>>>>>
>>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>>> upgrade will take a long time on Datanodes as well because of
>>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>>
>>>>>    - What are the risks with this upgrade ?
>>>>>
>>>>> What Hadoop applications do you run on top of your cluster? The hope
>>>>> is that everything continues working smoothly for the most part, but
>>>>> inevitably some backward incompatible changes creep in.
>>>>>
>>>>>    - Is there a place where I can review the changes made to file
>>>>>    system from 2.5.1 to 2.7.2?
>>>>>
>>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>>> to accumulate all the changes in the versions.
>>>>>
>>>>> Practically, I'd try to run my application on your upgraded test
>>>>> cluster.
>>>>>
>>>>> HTH
>>>>>
>>>>> Ravi
>>>>>
>>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>>> kamalasini@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>>
>>>>>> I followed the following link (
>>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>>> went without any issues. But this system did not have that much data as the
>>>>>> production system.
>>>>>>
>>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>>> would like to see what other people have done in these cases and their
>>>>>> experiences... Here are few questions I have..
>>>>>>
>>>>>>    - When we upgrade, does it change the namenode data structures
>>>>>>    and data nodes? I assume it only changes the name node...
>>>>>>    - What are the risks with this upgrade ?
>>>>>>    - Is there a place where I can review the changes made to file
>>>>>>    system from 2.5.1 to 2.7.2?
>>>>>>
>>>>>> I would really appreciate if you can share your experiences.
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Chathuri
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!

You're welcome! We did not have an HBase instance to upgrade. It depends on
how many blocks your datanodes are storing (== how big your disks are * how
many disks you have * how full your disks are). What are those numbers for
you? We experienced anywhere from 1-3 hours for the upgrade.

HTH
Ravi

On Thu, Mar 24, 2016 at 1:16 AM, Chathuri Wimalasena <ka...@gmail.com>
wrote:

> Hi Ravi,
>
> Thank you for all the information, Our application is indexing twitter
> data to HBase and then do some data analytics on top of that. That's why
> HDFS data is very important to us. We cannot tolerate any data loss with
> the update. Do you remember how long it took for you to upgrade it from
> 2.4.1 to 2.7.1 ?
>
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> Technically there is a rollback option during upgrade. I don't know how
>> well it has been tested, but the idea is that old metadata is not deleted
>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>> having to roll back). Its your applications that work on top of HDFS and
>> YARN that I'd be concerned about.
>>
>> HTH
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Thanks for information Ravi. Is there a way that I can back up data
>>> before the  update ? I was thinking about this approach..
>>>
>>> Copy the current hadoop directories to a new set of directories.
>>> Point hadoop to this new set
>>> Start the migration with the backup set
>>>
>>> Please let me know if people have done this upgrade successfully. I
>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>> in the cluster is very important.
>>> Thanks,
>>> Chathuri
>>>
>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi Chathuri!
>>>>
>>>>    - When we upgrade, does it change the namenode data structures and
>>>>    data nodes? I assume it only changes the name node...
>>>>
>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>> upgrade will take a long time on Datanodes as well because of
>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>
>>>>    - What are the risks with this upgrade ?
>>>>
>>>> What Hadoop applications do you run on top of your cluster? The hope is
>>>> that everything continues working smoothly for the most part, but
>>>> inevitably some backward incompatible changes creep in.
>>>>
>>>>    - Is there a place where I can review the changes made to file
>>>>    system from 2.5.1 to 2.7.2?
>>>>
>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>> to accumulate all the changes in the versions.
>>>>
>>>> Practically, I'd try to run my application on your upgraded test
>>>> cluster.
>>>>
>>>> HTH
>>>>
>>>> Ravi
>>>>
>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>> kamalasini@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>
>>>>> I followed the following link (
>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>> went without any issues. But this system did not have that much data as the
>>>>> production system.
>>>>>
>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>> would like to see what other people have done in these cases and their
>>>>> experiences... Here are few questions I have..
>>>>>
>>>>>    - When we upgrade, does it change the namenode data structures and
>>>>>    data nodes? I assume it only changes the name node...
>>>>>    - What are the risks with this upgrade ?
>>>>>    - Is there a place where I can review the changes made to file
>>>>>    system from 2.5.1 to 2.7.2?
>>>>>
>>>>> I would really appreciate if you can share your experiences.
>>>>>
>>>>> Thanks in advance,
>>>>> Chathuri
>>>>>
>>>>
>>>>
>>>
>>
>

unsubscribe

Posted by Marco Reis <ma...@marcoreis.net>.
On Thu, Mar 24, 2016 at 5:16 AM Chathuri Wimalasena <ka...@gmail.com>
wrote:

> Hi Ravi,
>
> Thank you for all the information, Our application is indexing twitter
> data to HBase and then do some data analytics on top of that. That's why
> HDFS data is very important to us. We cannot tolerate any data loss with
> the update. Do you remember how long it took for you to upgrade it from
> 2.4.1 to 2.7.1 ?
>
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> Technically there is a rollback option during upgrade. I don't know how
>> well it has been tested, but the idea is that old metadata is not deleted
>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>> having to roll back). Its your applications that work on top of HDFS and
>> YARN that I'd be concerned about.
>>
>> HTH
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Thanks for information Ravi. Is there a way that I can back up data
>>> before the  update ? I was thinking about this approach..
>>>
>>> Copy the current hadoop directories to a new set of directories.
>>> Point hadoop to this new set
>>> Start the migration with the backup set
>>>
>>> Please let me know if people have done this upgrade successfully. I
>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>> in the cluster is very important.
>>> Thanks,
>>> Chathuri
>>>
>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi Chathuri!
>>>>
>>>>    - When we upgrade, does it change the namenode data structures and
>>>>    data nodes? I assume it only changes the name node...
>>>>
>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>> upgrade will take a long time on Datanodes as well because of
>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>
>>>>    - What are the risks with this upgrade ?
>>>>
>>>> What Hadoop applications do you run on top of your cluster? The hope is
>>>> that everything continues working smoothly for the most part, but
>>>> inevitably some backward incompatible changes creep in.
>>>>
>>>>    - Is there a place where I can review the changes made to file
>>>>    system from 2.5.1 to 2.7.2?
>>>>
>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>> to accumulate all the changes in the versions.
>>>>
>>>> Practically, I'd try to run my application on your upgraded test
>>>> cluster.
>>>>
>>>> HTH
>>>>
>>>> Ravi
>>>>
>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>> kamalasini@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>
>>>>> I followed the following link (
>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>> went without any issues. But this system did not have that much data as the
>>>>> production system.
>>>>>
>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>> would like to see what other people have done in these cases and their
>>>>> experiences... Here are few questions I have..
>>>>>
>>>>>    - When we upgrade, does it change the namenode data structures and
>>>>>    data nodes? I assume it only changes the name node...
>>>>>    - What are the risks with this upgrade ?
>>>>>    - Is there a place where I can review the changes made to file
>>>>>    system from 2.5.1 to 2.7.2?
>>>>>
>>>>> I would really appreciate if you can share your experiences.
>>>>>
>>>>> Thanks in advance,
>>>>> Chathuri
>>>>>
>>>>
>>>>
>>>
>>
>

unsubscribe

Posted by Marco Reis <ma...@marcoreis.net>.
On Thu, Mar 24, 2016 at 5:16 AM Chathuri Wimalasena <ka...@gmail.com>
wrote:

> Hi Ravi,
>
> Thank you for all the information, Our application is indexing twitter
> data to HBase and then do some data analytics on top of that. That's why
> HDFS data is very important to us. We cannot tolerate any data loss with
> the update. Do you remember how long it took for you to upgrade it from
> 2.4.1 to 2.7.1 ?
>
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> Technically there is a rollback option during upgrade. I don't know how
>> well it has been tested, but the idea is that old metadata is not deleted
>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>> having to roll back). Its your applications that work on top of HDFS and
>> YARN that I'd be concerned about.
>>
>> HTH
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Thanks for information Ravi. Is there a way that I can back up data
>>> before the  update ? I was thinking about this approach..
>>>
>>> Copy the current hadoop directories to a new set of directories.
>>> Point hadoop to this new set
>>> Start the migration with the backup set
>>>
>>> Please let me know if people have done this upgrade successfully. I
>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>> in the cluster is very important.
>>> Thanks,
>>> Chathuri
>>>
>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi Chathuri!
>>>>
>>>>    - When we upgrade, does it change the namenode data structures and
>>>>    data nodes? I assume it only changes the name node...
>>>>
>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>> upgrade will take a long time on Datanodes as well because of
>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>
>>>>    - What are the risks with this upgrade ?
>>>>
>>>> What Hadoop applications do you run on top of your cluster? The hope is
>>>> that everything continues working smoothly for the most part, but
>>>> inevitably some backward incompatible changes creep in.
>>>>
>>>>    - Is there a place where I can review the changes made to file
>>>>    system from 2.5.1 to 2.7.2?
>>>>
>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>> to accumulate all the changes in the versions.
>>>>
>>>> Practically, I'd try to run my application on your upgraded test
>>>> cluster.
>>>>
>>>> HTH
>>>>
>>>> Ravi
>>>>
>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>> kamalasini@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>
>>>>> I followed the following link (
>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>> went without any issues. But this system did not have that much data as the
>>>>> production system.
>>>>>
>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>> would like to see what other people have done in these cases and their
>>>>> experiences... Here are few questions I have..
>>>>>
>>>>>    - When we upgrade, does it change the namenode data structures and
>>>>>    data nodes? I assume it only changes the name node...
>>>>>    - What are the risks with this upgrade ?
>>>>>    - Is there a place where I can review the changes made to file
>>>>>    system from 2.5.1 to 2.7.2?
>>>>>
>>>>> I would really appreciate if you can share your experiences.
>>>>>
>>>>> Thanks in advance,
>>>>> Chathuri
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!

You're welcome! We did not have an HBase instance to upgrade. It depends on
how many blocks your datanodes are storing (== how big your disks are * how
many disks you have * how full your disks are). What are those numbers for
you? We experienced anywhere from 1-3 hours for the upgrade.

HTH
Ravi

On Thu, Mar 24, 2016 at 1:16 AM, Chathuri Wimalasena <ka...@gmail.com>
wrote:

> Hi Ravi,
>
> Thank you for all the information, Our application is indexing twitter
> data to HBase and then do some data analytics on top of that. That's why
> HDFS data is very important to us. We cannot tolerate any data loss with
> the update. Do you remember how long it took for you to upgrade it from
> 2.4.1 to 2.7.1 ?
>
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> Technically there is a rollback option during upgrade. I don't know how
>> well it has been tested, but the idea is that old metadata is not deleted
>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>> having to roll back). Its your applications that work on top of HDFS and
>> YARN that I'd be concerned about.
>>
>> HTH
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Thanks for information Ravi. Is there a way that I can back up data
>>> before the  update ? I was thinking about this approach..
>>>
>>> Copy the current hadoop directories to a new set of directories.
>>> Point hadoop to this new set
>>> Start the migration with the backup set
>>>
>>> Please let me know if people have done this upgrade successfully. I
>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>> in the cluster is very important.
>>> Thanks,
>>> Chathuri
>>>
>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi Chathuri!
>>>>
>>>>    - When we upgrade, does it change the namenode data structures and
>>>>    data nodes? I assume it only changes the name node...
>>>>
>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>> upgrade will take a long time on Datanodes as well because of
>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>
>>>>    - What are the risks with this upgrade ?
>>>>
>>>> What Hadoop applications do you run on top of your cluster? The hope is
>>>> that everything continues working smoothly for the most part, but
>>>> inevitably some backward incompatible changes creep in.
>>>>
>>>>    - Is there a place where I can review the changes made to file
>>>>    system from 2.5.1 to 2.7.2?
>>>>
>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>> to accumulate all the changes in the versions.
>>>>
>>>> Practically, I'd try to run my application on your upgraded test
>>>> cluster.
>>>>
>>>> HTH
>>>>
>>>> Ravi
>>>>
>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>> kamalasini@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>
>>>>> I followed the following link (
>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>> went without any issues. But this system did not have that much data as the
>>>>> production system.
>>>>>
>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>> would like to see what other people have done in these cases and their
>>>>> experiences... Here are few questions I have..
>>>>>
>>>>>    - When we upgrade, does it change the namenode data structures and
>>>>>    data nodes? I assume it only changes the name node...
>>>>>    - What are the risks with this upgrade ?
>>>>>    - Is there a place where I can review the changes made to file
>>>>>    system from 2.5.1 to 2.7.2?
>>>>>
>>>>> I would really appreciate if you can share your experiences.
>>>>>
>>>>> Thanks in advance,
>>>>> Chathuri
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!

You're welcome! We did not have an HBase instance to upgrade. It depends on
how many blocks your datanodes are storing (== how big your disks are * how
many disks you have * how full your disks are). What are those numbers for
you? We experienced anywhere from 1-3 hours for the upgrade.

HTH
Ravi

On Thu, Mar 24, 2016 at 1:16 AM, Chathuri Wimalasena <ka...@gmail.com>
wrote:

> Hi Ravi,
>
> Thank you for all the information, Our application is indexing twitter
> data to HBase and then do some data analytics on top of that. That's why
> HDFS data is very important to us. We cannot tolerate any data loss with
> the update. Do you remember how long it took for you to upgrade it from
> 2.4.1 to 2.7.1 ?
>
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> Technically there is a rollback option during upgrade. I don't know how
>> well it has been tested, but the idea is that old metadata is not deleted
>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>> having to roll back). Its your applications that work on top of HDFS and
>> YARN that I'd be concerned about.
>>
>> HTH
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Thanks for information Ravi. Is there a way that I can back up data
>>> before the  update ? I was thinking about this approach..
>>>
>>> Copy the current hadoop directories to a new set of directories.
>>> Point hadoop to this new set
>>> Start the migration with the backup set
>>>
>>> Please let me know if people have done this upgrade successfully. I
>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>> in the cluster is very important.
>>> Thanks,
>>> Chathuri
>>>
>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi Chathuri!
>>>>
>>>>    - When we upgrade, does it change the namenode data structures and
>>>>    data nodes? I assume it only changes the name node...
>>>>
>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>> upgrade will take a long time on Datanodes as well because of
>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>
>>>>    - What are the risks with this upgrade ?
>>>>
>>>> What Hadoop applications do you run on top of your cluster? The hope is
>>>> that everything continues working smoothly for the most part, but
>>>> inevitably some backward incompatible changes creep in.
>>>>
>>>>    - Is there a place where I can review the changes made to file
>>>>    system from 2.5.1 to 2.7.2?
>>>>
>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>> to accumulate all the changes in the versions.
>>>>
>>>> Practically, I'd try to run my application on your upgraded test
>>>> cluster.
>>>>
>>>> HTH
>>>>
>>>> Ravi
>>>>
>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>> kamalasini@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>
>>>>> I followed the following link (
>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>> went without any issues. But this system did not have that much data as the
>>>>> production system.
>>>>>
>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>> would like to see what other people have done in these cases and their
>>>>> experiences... Here are few questions I have..
>>>>>
>>>>>    - When we upgrade, does it change the namenode data structures and
>>>>>    data nodes? I assume it only changes the name node...
>>>>>    - What are the risks with this upgrade ?
>>>>>    - Is there a place where I can review the changes made to file
>>>>>    system from 2.5.1 to 2.7.2?
>>>>>
>>>>> I would really appreciate if you can share your experiences.
>>>>>
>>>>> Thanks in advance,
>>>>> Chathuri
>>>>>
>>>>
>>>>
>>>
>>
>

unsubscribe

Posted by Marco Reis <ma...@marcoreis.net>.
On Thu, Mar 24, 2016 at 5:16 AM Chathuri Wimalasena <ka...@gmail.com>
wrote:

> Hi Ravi,
>
> Thank you for all the information, Our application is indexing twitter
> data to HBase and then do some data analytics on top of that. That's why
> HDFS data is very important to us. We cannot tolerate any data loss with
> the update. Do you remember how long it took for you to upgrade it from
> 2.4.1 to 2.7.1 ?
>
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> Technically there is a rollback option during upgrade. I don't know how
>> well it has been tested, but the idea is that old metadata is not deleted
>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>> having to roll back). Its your applications that work on top of HDFS and
>> YARN that I'd be concerned about.
>>
>> HTH
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Thanks for information Ravi. Is there a way that I can back up data
>>> before the  update ? I was thinking about this approach..
>>>
>>> Copy the current hadoop directories to a new set of directories.
>>> Point hadoop to this new set
>>> Start the migration with the backup set
>>>
>>> Please let me know if people have done this upgrade successfully. I
>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>> in the cluster is very important.
>>> Thanks,
>>> Chathuri
>>>
>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi Chathuri!
>>>>
>>>>    - When we upgrade, does it change the namenode data structures and
>>>>    data nodes? I assume it only changes the name node...
>>>>
>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>> upgrade will take a long time on Datanodes as well because of
>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>
>>>>    - What are the risks with this upgrade ?
>>>>
>>>> What Hadoop applications do you run on top of your cluster? The hope is
>>>> that everything continues working smoothly for the most part, but
>>>> inevitably some backward incompatible changes creep in.
>>>>
>>>>    - Is there a place where I can review the changes made to file
>>>>    system from 2.5.1 to 2.7.2?
>>>>
>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>> to accumulate all the changes in the versions.
>>>>
>>>> Practically, I'd try to run my application on your upgraded test
>>>> cluster.
>>>>
>>>> HTH
>>>>
>>>> Ravi
>>>>
>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>> kamalasini@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>
>>>>> I followed the following link (
>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>> went without any issues. But this system did not have that much data as the
>>>>> production system.
>>>>>
>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>> would like to see what other people have done in these cases and their
>>>>> experiences... Here are few questions I have..
>>>>>
>>>>>    - When we upgrade, does it change the namenode data structures and
>>>>>    data nodes? I assume it only changes the name node...
>>>>>    - What are the risks with this upgrade ?
>>>>>    - Is there a place where I can review the changes made to file
>>>>>    system from 2.5.1 to 2.7.2?
>>>>>
>>>>> I would really appreciate if you can share your experiences.
>>>>>
>>>>> Thanks in advance,
>>>>> Chathuri
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!

You're welcome! We did not have an HBase instance to upgrade. It depends on
how many blocks your datanodes are storing (== how big your disks are * how
many disks you have * how full your disks are). What are those numbers for
you? We experienced anywhere from 1-3 hours for the upgrade.

HTH
Ravi

On Thu, Mar 24, 2016 at 1:16 AM, Chathuri Wimalasena <ka...@gmail.com>
wrote:

> Hi Ravi,
>
> Thank you for all the information, Our application is indexing twitter
> data to HBase and then do some data analytics on top of that. That's why
> HDFS data is very important to us. We cannot tolerate any data loss with
> the update. Do you remember how long it took for you to upgrade it from
> 2.4.1 to 2.7.1 ?
>
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> Technically there is a rollback option during upgrade. I don't know how
>> well it has been tested, but the idea is that old metadata is not deleted
>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>> having to roll back). Its your applications that work on top of HDFS and
>> YARN that I'd be concerned about.
>>
>> HTH
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Thanks for information Ravi. Is there a way that I can back up data
>>> before the  update ? I was thinking about this approach..
>>>
>>> Copy the current hadoop directories to a new set of directories.
>>> Point hadoop to this new set
>>> Start the migration with the backup set
>>>
>>> Please let me know if people have done this upgrade successfully. I
>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>> in the cluster is very important.
>>> Thanks,
>>> Chathuri
>>>
>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi Chathuri!
>>>>
>>>>    - When we upgrade, does it change the namenode data structures and
>>>>    data nodes? I assume it only changes the name node...
>>>>
>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>> upgrade will take a long time on Datanodes as well because of
>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>
>>>>    - What are the risks with this upgrade ?
>>>>
>>>> What Hadoop applications do you run on top of your cluster? The hope is
>>>> that everything continues working smoothly for the most part, but
>>>> inevitably some backward incompatible changes creep in.
>>>>
>>>>    - Is there a place where I can review the changes made to file
>>>>    system from 2.5.1 to 2.7.2?
>>>>
>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>> to accumulate all the changes in the versions.
>>>>
>>>> Practically, I'd try to run my application on your upgraded test
>>>> cluster.
>>>>
>>>> HTH
>>>>
>>>> Ravi
>>>>
>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>> kamalasini@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>
>>>>> I followed the following link (
>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>> went without any issues. But this system did not have that much data as the
>>>>> production system.
>>>>>
>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>> would like to see what other people have done in these cases and their
>>>>> experiences... Here are few questions I have..
>>>>>
>>>>>    - When we upgrade, does it change the namenode data structures and
>>>>>    data nodes? I assume it only changes the name node...
>>>>>    - What are the risks with this upgrade ?
>>>>>    - Is there a place where I can review the changes made to file
>>>>>    system from 2.5.1 to 2.7.2?
>>>>>
>>>>> I would really appreciate if you can share your experiences.
>>>>>
>>>>> Thanks in advance,
>>>>> Chathuri
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Chathuri Wimalasena <ka...@gmail.com>.
Hi Ravi,

Thank you for all the information, Our application is indexing twitter data
to HBase and then do some data analytics on top of that. That's why HDFS
data is very important to us. We cannot tolerate any data loss with the
update. Do you remember how long it took for you to upgrade it from 2.4.1
to 2.7.1 ?

Thanks,
Chathuri

On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com> wrote:

> Hi Chathuri!
>
> Technically there is a rollback option during upgrade. I don't know how
> well it has been tested, but the idea is that old metadata is not deleted
> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
> having to roll back). Its your applications that work on top of HDFS and
> YARN that I'd be concerned about.
>
> HTH
> Ravi
>
> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <kamalasini@gmail.com
> > wrote:
>
>> Thanks for information Ravi. Is there a way that I can back up data
>> before the  update ? I was thinking about this approach..
>>
>> Copy the current hadoop directories to a new set of directories.
>> Point hadoop to this new set
>> Start the migration with the backup set
>>
>> Please let me know if people have done this upgrade successfully. I
>> believe many things can go wrong in a lengthy upgrade like this. The data
>> in the cluster is very important.
>> Thanks,
>> Chathuri
>>
>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>> wrote:
>>
>>> Hi Chathuri!
>>>
>>>    - When we upgrade, does it change the namenode data structures and
>>>    data nodes? I assume it only changes the name node...
>>>
>>> It changes the NN as well as DN layout. As a matter of fact, this
>>> upgrade will take a long time on Datanodes as well because of
>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>
>>>    - What are the risks with this upgrade ?
>>>
>>> What Hadoop applications do you run on top of your cluster? The hope is
>>> that everything continues working smoothly for the most part, but
>>> inevitably some backward incompatible changes creep in.
>>>
>>>    - Is there a place where I can review the changes made to file
>>>    system from 2.5.1 to 2.7.2?
>>>
>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>> to accumulate all the changes in the versions.
>>>
>>> Practically, I'd try to run my application on your upgraded test cluster.
>>>
>>> HTH
>>>
>>> Ravi
>>>
>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>> kamalasini@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>
>>>> I followed the following link (
>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>> and updated a single node system running in pseudo distributed mode and it
>>>> went without any issues. But this system did not have that much data as the
>>>> production system.
>>>>
>>>> Since this is a production system, I'm reluctant to do this update. I
>>>> would like to see what other people have done in these cases and their
>>>> experiences... Here are few questions I have..
>>>>
>>>>    - When we upgrade, does it change the namenode data structures and
>>>>    data nodes? I assume it only changes the name node...
>>>>    - What are the risks with this upgrade ?
>>>>    - Is there a place where I can review the changes made to file
>>>>    system from 2.5.1 to 2.7.2?
>>>>
>>>> I would really appreciate if you can share your experiences.
>>>>
>>>> Thanks in advance,
>>>> Chathuri
>>>>
>>>
>>>
>>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Chathuri Wimalasena <ka...@gmail.com>.
Hi Ravi,

Thank you for all the information, Our application is indexing twitter data
to HBase and then do some data analytics on top of that. That's why HDFS
data is very important to us. We cannot tolerate any data loss with the
update. Do you remember how long it took for you to upgrade it from 2.4.1
to 2.7.1 ?

Thanks,
Chathuri

On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com> wrote:

> Hi Chathuri!
>
> Technically there is a rollback option during upgrade. I don't know how
> well it has been tested, but the idea is that old metadata is not deleted
> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
> having to roll back). Its your applications that work on top of HDFS and
> YARN that I'd be concerned about.
>
> HTH
> Ravi
>
> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <kamalasini@gmail.com
> > wrote:
>
>> Thanks for information Ravi. Is there a way that I can back up data
>> before the  update ? I was thinking about this approach..
>>
>> Copy the current hadoop directories to a new set of directories.
>> Point hadoop to this new set
>> Start the migration with the backup set
>>
>> Please let me know if people have done this upgrade successfully. I
>> believe many things can go wrong in a lengthy upgrade like this. The data
>> in the cluster is very important.
>> Thanks,
>> Chathuri
>>
>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>> wrote:
>>
>>> Hi Chathuri!
>>>
>>>    - When we upgrade, does it change the namenode data structures and
>>>    data nodes? I assume it only changes the name node...
>>>
>>> It changes the NN as well as DN layout. As a matter of fact, this
>>> upgrade will take a long time on Datanodes as well because of
>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>
>>>    - What are the risks with this upgrade ?
>>>
>>> What Hadoop applications do you run on top of your cluster? The hope is
>>> that everything continues working smoothly for the most part, but
>>> inevitably some backward incompatible changes creep in.
>>>
>>>    - Is there a place where I can review the changes made to file
>>>    system from 2.5.1 to 2.7.2?
>>>
>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>> to accumulate all the changes in the versions.
>>>
>>> Practically, I'd try to run my application on your upgraded test cluster.
>>>
>>> HTH
>>>
>>> Ravi
>>>
>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>> kamalasini@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>
>>>> I followed the following link (
>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>> and updated a single node system running in pseudo distributed mode and it
>>>> went without any issues. But this system did not have that much data as the
>>>> production system.
>>>>
>>>> Since this is a production system, I'm reluctant to do this update. I
>>>> would like to see what other people have done in these cases and their
>>>> experiences... Here are few questions I have..
>>>>
>>>>    - When we upgrade, does it change the namenode data structures and
>>>>    data nodes? I assume it only changes the name node...
>>>>    - What are the risks with this upgrade ?
>>>>    - Is there a place where I can review the changes made to file
>>>>    system from 2.5.1 to 2.7.2?
>>>>
>>>> I would really appreciate if you can share your experiences.
>>>>
>>>> Thanks in advance,
>>>> Chathuri
>>>>
>>>
>>>
>>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Chathuri Wimalasena <ka...@gmail.com>.
Hi Ravi,

Thank you for all the information, Our application is indexing twitter data
to HBase and then do some data analytics on top of that. That's why HDFS
data is very important to us. We cannot tolerate any data loss with the
update. Do you remember how long it took for you to upgrade it from 2.4.1
to 2.7.1 ?

Thanks,
Chathuri

On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com> wrote:

> Hi Chathuri!
>
> Technically there is a rollback option during upgrade. I don't know how
> well it has been tested, but the idea is that old metadata is not deleted
> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
> having to roll back). Its your applications that work on top of HDFS and
> YARN that I'd be concerned about.
>
> HTH
> Ravi
>
> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <kamalasini@gmail.com
> > wrote:
>
>> Thanks for information Ravi. Is there a way that I can back up data
>> before the  update ? I was thinking about this approach..
>>
>> Copy the current hadoop directories to a new set of directories.
>> Point hadoop to this new set
>> Start the migration with the backup set
>>
>> Please let me know if people have done this upgrade successfully. I
>> believe many things can go wrong in a lengthy upgrade like this. The data
>> in the cluster is very important.
>> Thanks,
>> Chathuri
>>
>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>> wrote:
>>
>>> Hi Chathuri!
>>>
>>>    - When we upgrade, does it change the namenode data structures and
>>>    data nodes? I assume it only changes the name node...
>>>
>>> It changes the NN as well as DN layout. As a matter of fact, this
>>> upgrade will take a long time on Datanodes as well because of
>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>
>>>    - What are the risks with this upgrade ?
>>>
>>> What Hadoop applications do you run on top of your cluster? The hope is
>>> that everything continues working smoothly for the most part, but
>>> inevitably some backward incompatible changes creep in.
>>>
>>>    - Is there a place where I can review the changes made to file
>>>    system from 2.5.1 to 2.7.2?
>>>
>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>> to accumulate all the changes in the versions.
>>>
>>> Practically, I'd try to run my application on your upgraded test cluster.
>>>
>>> HTH
>>>
>>> Ravi
>>>
>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>> kamalasini@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>
>>>> I followed the following link (
>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>> and updated a single node system running in pseudo distributed mode and it
>>>> went without any issues. But this system did not have that much data as the
>>>> production system.
>>>>
>>>> Since this is a production system, I'm reluctant to do this update. I
>>>> would like to see what other people have done in these cases and their
>>>> experiences... Here are few questions I have..
>>>>
>>>>    - When we upgrade, does it change the namenode data structures and
>>>>    data nodes? I assume it only changes the name node...
>>>>    - What are the risks with this upgrade ?
>>>>    - Is there a place where I can review the changes made to file
>>>>    system from 2.5.1 to 2.7.2?
>>>>
>>>> I would really appreciate if you can share your experiences.
>>>>
>>>> Thanks in advance,
>>>> Chathuri
>>>>
>>>
>>>
>>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Chathuri Wimalasena <ka...@gmail.com>.
Hi Ravi,

Thank you for all the information, Our application is indexing twitter data
to HBase and then do some data analytics on top of that. That's why HDFS
data is very important to us. We cannot tolerate any data loss with the
update. Do you remember how long it took for you to upgrade it from 2.4.1
to 2.7.1 ?

Thanks,
Chathuri

On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com> wrote:

> Hi Chathuri!
>
> Technically there is a rollback option during upgrade. I don't know how
> well it has been tested, but the idea is that old metadata is not deleted
> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
> having to roll back). Its your applications that work on top of HDFS and
> YARN that I'd be concerned about.
>
> HTH
> Ravi
>
> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <kamalasini@gmail.com
> > wrote:
>
>> Thanks for information Ravi. Is there a way that I can back up data
>> before the  update ? I was thinking about this approach..
>>
>> Copy the current hadoop directories to a new set of directories.
>> Point hadoop to this new set
>> Start the migration with the backup set
>>
>> Please let me know if people have done this upgrade successfully. I
>> believe many things can go wrong in a lengthy upgrade like this. The data
>> in the cluster is very important.
>> Thanks,
>> Chathuri
>>
>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>> wrote:
>>
>>> Hi Chathuri!
>>>
>>>    - When we upgrade, does it change the namenode data structures and
>>>    data nodes? I assume it only changes the name node...
>>>
>>> It changes the NN as well as DN layout. As a matter of fact, this
>>> upgrade will take a long time on Datanodes as well because of
>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>
>>>    - What are the risks with this upgrade ?
>>>
>>> What Hadoop applications do you run on top of your cluster? The hope is
>>> that everything continues working smoothly for the most part, but
>>> inevitably some backward incompatible changes creep in.
>>>
>>>    - Is there a place where I can review the changes made to file
>>>    system from 2.5.1 to 2.7.2?
>>>
>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>> to accumulate all the changes in the versions.
>>>
>>> Practically, I'd try to run my application on your upgraded test cluster.
>>>
>>> HTH
>>>
>>> Ravi
>>>
>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>> kamalasini@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>
>>>> I followed the following link (
>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>> and updated a single node system running in pseudo distributed mode and it
>>>> went without any issues. But this system did not have that much data as the
>>>> production system.
>>>>
>>>> Since this is a production system, I'm reluctant to do this update. I
>>>> would like to see what other people have done in these cases and their
>>>> experiences... Here are few questions I have..
>>>>
>>>>    - When we upgrade, does it change the namenode data structures and
>>>>    data nodes? I assume it only changes the name node...
>>>>    - What are the risks with this upgrade ?
>>>>    - Is there a place where I can review the changes made to file
>>>>    system from 2.5.1 to 2.7.2?
>>>>
>>>> I would really appreciate if you can share your experiences.
>>>>
>>>> Thanks in advance,
>>>> Chathuri
>>>>
>>>
>>>
>>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!

Technically there is a rollback option during upgrade. I don't know how
well it has been tested, but the idea is that old metadata is not deleted
until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
fairly confident that the HDFS upgrade will work smoothly. We have upgraded
quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
having to roll back). Its your applications that work on top of HDFS and
YARN that I'd be concerned about.

HTH
Ravi

On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <ka...@gmail.com>
wrote:

> Thanks for information Ravi. Is there a way that I can back up data before
> the  update ? I was thinking about this approach..
>
> Copy the current hadoop directories to a new set of directories.
> Point hadoop to this new set
> Start the migration with the backup set
>
> Please let me know if people have done this upgrade successfully. I
> believe many things can go wrong in a lengthy upgrade like this. The data
> in the cluster is very important.
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>>    - When we upgrade, does it change the namenode data structures and
>>    data nodes? I assume it only changes the name node...
>>
>> It changes the NN as well as DN layout. As a matter of fact, this upgrade
>> will take a long time on Datanodes as well because of
>> https://issues.apache.org/jira/browse/HDFS-6482
>>
>>    - What are the risks with this upgrade ?
>>
>> What Hadoop applications do you run on top of your cluster? The hope is
>> that everything continues working smoothly for the most part, but
>> inevitably some backward incompatible changes creep in.
>>
>>    - Is there a place where I can review the changes made to file system
>>    from 2.5.1 to 2.7.2?
>>
>> The release notes. http://hadoop.apache.org/releases.html .You'd have to
>> accumulate all the changes in the versions.
>>
>> Practically, I'd try to run my application on your upgraded test cluster.
>>
>> HTH
>>
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> We have a hadoop production deployment with 1 name node and 10 data
>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>
>>> I followed the following link (
>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>> and updated a single node system running in pseudo distributed mode and it
>>> went without any issues. But this system did not have that much data as the
>>> production system.
>>>
>>> Since this is a production system, I'm reluctant to do this update. I
>>> would like to see what other people have done in these cases and their
>>> experiences... Here are few questions I have..
>>>
>>>    - When we upgrade, does it change the namenode data structures and
>>>    data nodes? I assume it only changes the name node...
>>>    - What are the risks with this upgrade ?
>>>    - Is there a place where I can review the changes made to file
>>>    system from 2.5.1 to 2.7.2?
>>>
>>> I would really appreciate if you can share your experiences.
>>>
>>> Thanks in advance,
>>> Chathuri
>>>
>>
>>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!

Technically there is a rollback option during upgrade. I don't know how
well it has been tested, but the idea is that old metadata is not deleted
until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
fairly confident that the HDFS upgrade will work smoothly. We have upgraded
quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
having to roll back). Its your applications that work on top of HDFS and
YARN that I'd be concerned about.

HTH
Ravi

On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <ka...@gmail.com>
wrote:

> Thanks for information Ravi. Is there a way that I can back up data before
> the  update ? I was thinking about this approach..
>
> Copy the current hadoop directories to a new set of directories.
> Point hadoop to this new set
> Start the migration with the backup set
>
> Please let me know if people have done this upgrade successfully. I
> believe many things can go wrong in a lengthy upgrade like this. The data
> in the cluster is very important.
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>>    - When we upgrade, does it change the namenode data structures and
>>    data nodes? I assume it only changes the name node...
>>
>> It changes the NN as well as DN layout. As a matter of fact, this upgrade
>> will take a long time on Datanodes as well because of
>> https://issues.apache.org/jira/browse/HDFS-6482
>>
>>    - What are the risks with this upgrade ?
>>
>> What Hadoop applications do you run on top of your cluster? The hope is
>> that everything continues working smoothly for the most part, but
>> inevitably some backward incompatible changes creep in.
>>
>>    - Is there a place where I can review the changes made to file system
>>    from 2.5.1 to 2.7.2?
>>
>> The release notes. http://hadoop.apache.org/releases.html .You'd have to
>> accumulate all the changes in the versions.
>>
>> Practically, I'd try to run my application on your upgraded test cluster.
>>
>> HTH
>>
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> We have a hadoop production deployment with 1 name node and 10 data
>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>
>>> I followed the following link (
>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>> and updated a single node system running in pseudo distributed mode and it
>>> went without any issues. But this system did not have that much data as the
>>> production system.
>>>
>>> Since this is a production system, I'm reluctant to do this update. I
>>> would like to see what other people have done in these cases and their
>>> experiences... Here are few questions I have..
>>>
>>>    - When we upgrade, does it change the namenode data structures and
>>>    data nodes? I assume it only changes the name node...
>>>    - What are the risks with this upgrade ?
>>>    - Is there a place where I can review the changes made to file
>>>    system from 2.5.1 to 2.7.2?
>>>
>>> I would really appreciate if you can share your experiences.
>>>
>>> Thanks in advance,
>>> Chathuri
>>>
>>
>>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!

Technically there is a rollback option during upgrade. I don't know how
well it has been tested, but the idea is that old metadata is not deleted
until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
fairly confident that the HDFS upgrade will work smoothly. We have upgraded
quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
having to roll back). Its your applications that work on top of HDFS and
YARN that I'd be concerned about.

HTH
Ravi

On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <ka...@gmail.com>
wrote:

> Thanks for information Ravi. Is there a way that I can back up data before
> the  update ? I was thinking about this approach..
>
> Copy the current hadoop directories to a new set of directories.
> Point hadoop to this new set
> Start the migration with the backup set
>
> Please let me know if people have done this upgrade successfully. I
> believe many things can go wrong in a lengthy upgrade like this. The data
> in the cluster is very important.
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>>    - When we upgrade, does it change the namenode data structures and
>>    data nodes? I assume it only changes the name node...
>>
>> It changes the NN as well as DN layout. As a matter of fact, this upgrade
>> will take a long time on Datanodes as well because of
>> https://issues.apache.org/jira/browse/HDFS-6482
>>
>>    - What are the risks with this upgrade ?
>>
>> What Hadoop applications do you run on top of your cluster? The hope is
>> that everything continues working smoothly for the most part, but
>> inevitably some backward incompatible changes creep in.
>>
>>    - Is there a place where I can review the changes made to file system
>>    from 2.5.1 to 2.7.2?
>>
>> The release notes. http://hadoop.apache.org/releases.html .You'd have to
>> accumulate all the changes in the versions.
>>
>> Practically, I'd try to run my application on your upgraded test cluster.
>>
>> HTH
>>
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> We have a hadoop production deployment with 1 name node and 10 data
>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>
>>> I followed the following link (
>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>> and updated a single node system running in pseudo distributed mode and it
>>> went without any issues. But this system did not have that much data as the
>>> production system.
>>>
>>> Since this is a production system, I'm reluctant to do this update. I
>>> would like to see what other people have done in these cases and their
>>> experiences... Here are few questions I have..
>>>
>>>    - When we upgrade, does it change the namenode data structures and
>>>    data nodes? I assume it only changes the name node...
>>>    - What are the risks with this upgrade ?
>>>    - Is there a place where I can review the changes made to file
>>>    system from 2.5.1 to 2.7.2?
>>>
>>> I would really appreciate if you can share your experiences.
>>>
>>> Thanks in advance,
>>> Chathuri
>>>
>>
>>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!

Technically there is a rollback option during upgrade. I don't know how
well it has been tested, but the idea is that old metadata is not deleted
until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
fairly confident that the HDFS upgrade will work smoothly. We have upgraded
quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
having to roll back). Its your applications that work on top of HDFS and
YARN that I'd be concerned about.

HTH
Ravi

On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <ka...@gmail.com>
wrote:

> Thanks for information Ravi. Is there a way that I can back up data before
> the  update ? I was thinking about this approach..
>
> Copy the current hadoop directories to a new set of directories.
> Point hadoop to this new set
> Start the migration with the backup set
>
> Please let me know if people have done this upgrade successfully. I
> believe many things can go wrong in a lengthy upgrade like this. The data
> in the cluster is very important.
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>>    - When we upgrade, does it change the namenode data structures and
>>    data nodes? I assume it only changes the name node...
>>
>> It changes the NN as well as DN layout. As a matter of fact, this upgrade
>> will take a long time on Datanodes as well because of
>> https://issues.apache.org/jira/browse/HDFS-6482
>>
>>    - What are the risks with this upgrade ?
>>
>> What Hadoop applications do you run on top of your cluster? The hope is
>> that everything continues working smoothly for the most part, but
>> inevitably some backward incompatible changes creep in.
>>
>>    - Is there a place where I can review the changes made to file system
>>    from 2.5.1 to 2.7.2?
>>
>> The release notes. http://hadoop.apache.org/releases.html .You'd have to
>> accumulate all the changes in the versions.
>>
>> Practically, I'd try to run my application on your upgraded test cluster.
>>
>> HTH
>>
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> We have a hadoop production deployment with 1 name node and 10 data
>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>
>>> I followed the following link (
>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>> and updated a single node system running in pseudo distributed mode and it
>>> went without any issues. But this system did not have that much data as the
>>> production system.
>>>
>>> Since this is a production system, I'm reluctant to do this update. I
>>> would like to see what other people have done in these cases and their
>>> experiences... Here are few questions I have..
>>>
>>>    - When we upgrade, does it change the namenode data structures and
>>>    data nodes? I assume it only changes the name node...
>>>    - What are the risks with this upgrade ?
>>>    - Is there a place where I can review the changes made to file
>>>    system from 2.5.1 to 2.7.2?
>>>
>>> I would really appreciate if you can share your experiences.
>>>
>>> Thanks in advance,
>>> Chathuri
>>>
>>
>>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Chathuri Wimalasena <ka...@gmail.com>.
Thanks for information Ravi. Is there a way that I can back up data before
the  update ? I was thinking about this approach..

Copy the current hadoop directories to a new set of directories.
Point hadoop to this new set
Start the migration with the backup set

Please let me know if people have done this upgrade successfully. I believe
many things can go wrong in a lengthy upgrade like this. The data in the
cluster is very important.
Thanks,
Chathuri

On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com> wrote:

> Hi Chathuri!
>
>    - When we upgrade, does it change the namenode data structures and
>    data nodes? I assume it only changes the name node...
>
> It changes the NN as well as DN layout. As a matter of fact, this upgrade
> will take a long time on Datanodes as well because of
> https://issues.apache.org/jira/browse/HDFS-6482
>
>    - What are the risks with this upgrade ?
>
> What Hadoop applications do you run on top of your cluster? The hope is
> that everything continues working smoothly for the most part, but
> inevitably some backward incompatible changes creep in.
>
>    - Is there a place where I can review the changes made to file system
>    from 2.5.1 to 2.7.2?
>
> The release notes. http://hadoop.apache.org/releases.html .You'd have to
> accumulate all the changes in the versions.
>
> Practically, I'd try to run my application on your upgraded test cluster.
>
> HTH
>
> Ravi
>
> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
> kamalasini@gmail.com> wrote:
>
>> Hi,
>>
>> We have a hadoop production deployment with 1 name node and 10 data nodes
>> which has more than 20TB of data in HDFS. We are currently using Hadoop
>> 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>
>> I followed the following link (
>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>> and updated a single node system running in pseudo distributed mode and it
>> went without any issues. But this system did not have that much data as the
>> production system.
>>
>> Since this is a production system, I'm reluctant to do this update. I
>> would like to see what other people have done in these cases and their
>> experiences... Here are few questions I have..
>>
>>    - When we upgrade, does it change the namenode data structures and
>>    data nodes? I assume it only changes the name node...
>>    - What are the risks with this upgrade ?
>>    - Is there a place where I can review the changes made to file system
>>    from 2.5.1 to 2.7.2?
>>
>> I would really appreciate if you can share your experiences.
>>
>> Thanks in advance,
>> Chathuri
>>
>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Chathuri Wimalasena <ka...@gmail.com>.
Thanks for information Ravi. Is there a way that I can back up data before
the  update ? I was thinking about this approach..

Copy the current hadoop directories to a new set of directories.
Point hadoop to this new set
Start the migration with the backup set

Please let me know if people have done this upgrade successfully. I believe
many things can go wrong in a lengthy upgrade like this. The data in the
cluster is very important.
Thanks,
Chathuri

On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com> wrote:

> Hi Chathuri!
>
>    - When we upgrade, does it change the namenode data structures and
>    data nodes? I assume it only changes the name node...
>
> It changes the NN as well as DN layout. As a matter of fact, this upgrade
> will take a long time on Datanodes as well because of
> https://issues.apache.org/jira/browse/HDFS-6482
>
>    - What are the risks with this upgrade ?
>
> What Hadoop applications do you run on top of your cluster? The hope is
> that everything continues working smoothly for the most part, but
> inevitably some backward incompatible changes creep in.
>
>    - Is there a place where I can review the changes made to file system
>    from 2.5.1 to 2.7.2?
>
> The release notes. http://hadoop.apache.org/releases.html .You'd have to
> accumulate all the changes in the versions.
>
> Practically, I'd try to run my application on your upgraded test cluster.
>
> HTH
>
> Ravi
>
> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
> kamalasini@gmail.com> wrote:
>
>> Hi,
>>
>> We have a hadoop production deployment with 1 name node and 10 data nodes
>> which has more than 20TB of data in HDFS. We are currently using Hadoop
>> 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>
>> I followed the following link (
>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>> and updated a single node system running in pseudo distributed mode and it
>> went without any issues. But this system did not have that much data as the
>> production system.
>>
>> Since this is a production system, I'm reluctant to do this update. I
>> would like to see what other people have done in these cases and their
>> experiences... Here are few questions I have..
>>
>>    - When we upgrade, does it change the namenode data structures and
>>    data nodes? I assume it only changes the name node...
>>    - What are the risks with this upgrade ?
>>    - Is there a place where I can review the changes made to file system
>>    from 2.5.1 to 2.7.2?
>>
>> I would really appreciate if you can share your experiences.
>>
>> Thanks in advance,
>> Chathuri
>>
>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Chathuri Wimalasena <ka...@gmail.com>.
Thanks for information Ravi. Is there a way that I can back up data before
the  update ? I was thinking about this approach..

Copy the current hadoop directories to a new set of directories.
Point hadoop to this new set
Start the migration with the backup set

Please let me know if people have done this upgrade successfully. I believe
many things can go wrong in a lengthy upgrade like this. The data in the
cluster is very important.
Thanks,
Chathuri

On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com> wrote:

> Hi Chathuri!
>
>    - When we upgrade, does it change the namenode data structures and
>    data nodes? I assume it only changes the name node...
>
> It changes the NN as well as DN layout. As a matter of fact, this upgrade
> will take a long time on Datanodes as well because of
> https://issues.apache.org/jira/browse/HDFS-6482
>
>    - What are the risks with this upgrade ?
>
> What Hadoop applications do you run on top of your cluster? The hope is
> that everything continues working smoothly for the most part, but
> inevitably some backward incompatible changes creep in.
>
>    - Is there a place where I can review the changes made to file system
>    from 2.5.1 to 2.7.2?
>
> The release notes. http://hadoop.apache.org/releases.html .You'd have to
> accumulate all the changes in the versions.
>
> Practically, I'd try to run my application on your upgraded test cluster.
>
> HTH
>
> Ravi
>
> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
> kamalasini@gmail.com> wrote:
>
>> Hi,
>>
>> We have a hadoop production deployment with 1 name node and 10 data nodes
>> which has more than 20TB of data in HDFS. We are currently using Hadoop
>> 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>
>> I followed the following link (
>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>> and updated a single node system running in pseudo distributed mode and it
>> went without any issues. But this system did not have that much data as the
>> production system.
>>
>> Since this is a production system, I'm reluctant to do this update. I
>> would like to see what other people have done in these cases and their
>> experiences... Here are few questions I have..
>>
>>    - When we upgrade, does it change the namenode data structures and
>>    data nodes? I assume it only changes the name node...
>>    - What are the risks with this upgrade ?
>>    - Is there a place where I can review the changes made to file system
>>    from 2.5.1 to 2.7.2?
>>
>> I would really appreciate if you can share your experiences.
>>
>> Thanks in advance,
>> Chathuri
>>
>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Chathuri Wimalasena <ka...@gmail.com>.
Thanks for information Ravi. Is there a way that I can back up data before
the  update ? I was thinking about this approach..

Copy the current hadoop directories to a new set of directories.
Point hadoop to this new set
Start the migration with the backup set

Please let me know if people have done this upgrade successfully. I believe
many things can go wrong in a lengthy upgrade like this. The data in the
cluster is very important.
Thanks,
Chathuri

On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com> wrote:

> Hi Chathuri!
>
>    - When we upgrade, does it change the namenode data structures and
>    data nodes? I assume it only changes the name node...
>
> It changes the NN as well as DN layout. As a matter of fact, this upgrade
> will take a long time on Datanodes as well because of
> https://issues.apache.org/jira/browse/HDFS-6482
>
>    - What are the risks with this upgrade ?
>
> What Hadoop applications do you run on top of your cluster? The hope is
> that everything continues working smoothly for the most part, but
> inevitably some backward incompatible changes creep in.
>
>    - Is there a place where I can review the changes made to file system
>    from 2.5.1 to 2.7.2?
>
> The release notes. http://hadoop.apache.org/releases.html .You'd have to
> accumulate all the changes in the versions.
>
> Practically, I'd try to run my application on your upgraded test cluster.
>
> HTH
>
> Ravi
>
> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
> kamalasini@gmail.com> wrote:
>
>> Hi,
>>
>> We have a hadoop production deployment with 1 name node and 10 data nodes
>> which has more than 20TB of data in HDFS. We are currently using Hadoop
>> 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>
>> I followed the following link (
>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>> and updated a single node system running in pseudo distributed mode and it
>> went without any issues. But this system did not have that much data as the
>> production system.
>>
>> Since this is a production system, I'm reluctant to do this update. I
>> would like to see what other people have done in these cases and their
>> experiences... Here are few questions I have..
>>
>>    - When we upgrade, does it change the namenode data structures and
>>    data nodes? I assume it only changes the name node...
>>    - What are the risks with this upgrade ?
>>    - Is there a place where I can review the changes made to file system
>>    from 2.5.1 to 2.7.2?
>>
>> I would really appreciate if you can share your experiences.
>>
>> Thanks in advance,
>> Chathuri
>>
>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!

   - When we upgrade, does it change the namenode data structures and data
   nodes? I assume it only changes the name node...

It changes the NN as well as DN layout. As a matter of fact, this upgrade
will take a long time on Datanodes as well because of
https://issues.apache.org/jira/browse/HDFS-6482

   - What are the risks with this upgrade ?

What Hadoop applications do you run on top of your cluster? The hope is
that everything continues working smoothly for the most part, but
inevitably some backward incompatible changes creep in.

   - Is there a place where I can review the changes made to file system
   from 2.5.1 to 2.7.2?

The release notes. http://hadoop.apache.org/releases.html .You'd have to
accumulate all the changes in the versions.

Practically, I'd try to run my application on your upgraded test cluster.

HTH

Ravi

On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <ka...@gmail.com>
wrote:

> Hi,
>
> We have a hadoop production deployment with 1 name node and 10 data nodes
> which has more than 20TB of data in HDFS. We are currently using Hadoop
> 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>
> I followed the following link (
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
> and updated a single node system running in pseudo distributed mode and it
> went without any issues. But this system did not have that much data as the
> production system.
>
> Since this is a production system, I'm reluctant to do this update. I
> would like to see what other people have done in these cases and their
> experiences... Here are few questions I have..
>
>    - When we upgrade, does it change the namenode data structures and
>    data nodes? I assume it only changes the name node...
>    - What are the risks with this upgrade ?
>    - Is there a place where I can review the changes made to file system
>    from 2.5.1 to 2.7.2?
>
> I would really appreciate if you can share your experiences.
>
> Thanks in advance,
> Chathuri
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!

   - When we upgrade, does it change the namenode data structures and data
   nodes? I assume it only changes the name node...

It changes the NN as well as DN layout. As a matter of fact, this upgrade
will take a long time on Datanodes as well because of
https://issues.apache.org/jira/browse/HDFS-6482

   - What are the risks with this upgrade ?

What Hadoop applications do you run on top of your cluster? The hope is
that everything continues working smoothly for the most part, but
inevitably some backward incompatible changes creep in.

   - Is there a place where I can review the changes made to file system
   from 2.5.1 to 2.7.2?

The release notes. http://hadoop.apache.org/releases.html .You'd have to
accumulate all the changes in the versions.

Practically, I'd try to run my application on your upgraded test cluster.

HTH

Ravi

On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <ka...@gmail.com>
wrote:

> Hi,
>
> We have a hadoop production deployment with 1 name node and 10 data nodes
> which has more than 20TB of data in HDFS. We are currently using Hadoop
> 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>
> I followed the following link (
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
> and updated a single node system running in pseudo distributed mode and it
> went without any issues. But this system did not have that much data as the
> production system.
>
> Since this is a production system, I'm reluctant to do this update. I
> would like to see what other people have done in these cases and their
> experiences... Here are few questions I have..
>
>    - When we upgrade, does it change the namenode data structures and
>    data nodes? I assume it only changes the name node...
>    - What are the risks with this upgrade ?
>    - Is there a place where I can review the changes made to file system
>    from 2.5.1 to 2.7.2?
>
> I would really appreciate if you can share your experiences.
>
> Thanks in advance,
> Chathuri
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!

   - When we upgrade, does it change the namenode data structures and data
   nodes? I assume it only changes the name node...

It changes the NN as well as DN layout. As a matter of fact, this upgrade
will take a long time on Datanodes as well because of
https://issues.apache.org/jira/browse/HDFS-6482

   - What are the risks with this upgrade ?

What Hadoop applications do you run on top of your cluster? The hope is
that everything continues working smoothly for the most part, but
inevitably some backward incompatible changes creep in.

   - Is there a place where I can review the changes made to file system
   from 2.5.1 to 2.7.2?

The release notes. http://hadoop.apache.org/releases.html .You'd have to
accumulate all the changes in the versions.

Practically, I'd try to run my application on your upgraded test cluster.

HTH

Ravi

On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <ka...@gmail.com>
wrote:

> Hi,
>
> We have a hadoop production deployment with 1 name node and 10 data nodes
> which has more than 20TB of data in HDFS. We are currently using Hadoop
> 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>
> I followed the following link (
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
> and updated a single node system running in pseudo distributed mode and it
> went without any issues. But this system did not have that much data as the
> production system.
>
> Since this is a production system, I'm reluctant to do this update. I
> would like to see what other people have done in these cases and their
> experiences... Here are few questions I have..
>
>    - When we upgrade, does it change the namenode data structures and
>    data nodes? I assume it only changes the name node...
>    - What are the risks with this upgrade ?
>    - Is there a place where I can review the changes made to file system
>    from 2.5.1 to 2.7.2?
>
> I would really appreciate if you can share your experiences.
>
> Thanks in advance,
> Chathuri
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!

   - When we upgrade, does it change the namenode data structures and data
   nodes? I assume it only changes the name node...

It changes the NN as well as DN layout. As a matter of fact, this upgrade
will take a long time on Datanodes as well because of
https://issues.apache.org/jira/browse/HDFS-6482

   - What are the risks with this upgrade ?

What Hadoop applications do you run on top of your cluster? The hope is
that everything continues working smoothly for the most part, but
inevitably some backward incompatible changes creep in.

   - Is there a place where I can review the changes made to file system
   from 2.5.1 to 2.7.2?

The release notes. http://hadoop.apache.org/releases.html .You'd have to
accumulate all the changes in the versions.

Practically, I'd try to run my application on your upgraded test cluster.

HTH

Ravi

On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <ka...@gmail.com>
wrote:

> Hi,
>
> We have a hadoop production deployment with 1 name node and 10 data nodes
> which has more than 20TB of data in HDFS. We are currently using Hadoop
> 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>
> I followed the following link (
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
> and updated a single node system running in pseudo distributed mode and it
> went without any issues. But this system did not have that much data as the
> production system.
>
> Since this is a production system, I'm reluctant to do this update. I
> would like to see what other people have done in these cases and their
> experiences... Here are few questions I have..
>
>    - When we upgrade, does it change the namenode data structures and
>    data nodes? I assume it only changes the name node...
>    - What are the risks with this upgrade ?
>    - Is there a place where I can review the changes made to file system
>    from 2.5.1 to 2.7.2?
>
> I would really appreciate if you can share your experiences.
>
> Thanks in advance,
> Chathuri
>