You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Chathuri Wimalasena <ka...@gmail.com> on 2016/03/23 20:17:41 UTC
Upgrading production hadoop system from 2.5.1 to 2.7.2
Hi,
We have a hadoop production deployment with 1 name node and 10 data nodes
which has more than 20TB of data in HDFS. We are currently using Hadoop
2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
I followed the following link (
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
and updated a single node system running in pseudo distributed mode and it
went without any issues. But this system did not have that much data as the
production system.
Since this is a production system, I'm reluctant to do this update. I would
like to see what other people have done in these cases and their
experiences... Here are few questions I have..
- When we upgrade, does it change the namenode data structures and data
nodes? I assume it only changes the name node...
- What are the risks with this upgrade ?
- Is there a place where I can review the changes made to file system
from 2.5.1 to 2.7.2?
I would really appreciate if you can share your experiences.
Thanks in advance,
Chathuri
Re: Upgrading production
Posted by Musty Rehmani <mu...@yahoo.com.INVALID>.
Keep the meta data backup before upgrade. Preferably on local machine. Do not finalize upgrade until you are OK with data availability
Musty
Sent from Yahoo Mail on Android
On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash<ra...@gmail.com> wrote: Hi Chathuri!
Technically there is a rollback option during upgrade. I don't know how well it has been tested, but the idea is that old metadata is not deleted until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm fairly confident that the HDFS upgrade will work smoothly. We have upgraded quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never having to roll back). Its your applications that work on top of HDFS and YARN that I'd be concerned about.
HTH
Ravi
On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <ka...@gmail.com> wrote:
Thanks for information Ravi. Is there a way that I can back up data before the update ? I was thinking about this approach..
Copy the current hadoop directories to a new set of directories.Point hadoop to this new setStart the migration with the backup set
Please let me know if people have done this upgrade successfully. I believe many things can go wrong in a lengthy upgrade like this. The data in the cluster is very important. Thanks,Chathuri
On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com> wrote:
Hi Chathuri!
- When we upgrade, does it change the namenode data structures and data nodes? I assume it only changes the name node...
It changes the NN as well as DN layout. As a matter of fact, this upgrade will take a long time on Datanodes as well because of https://issues.apache.org/jira/browse/HDFS-6482
- What are the risks with this upgrade ?
What Hadoop applications do you run on top of your cluster? The hope is that everything continues working smoothly for the most part, but inevitably some backward incompatible changes creep in.
- Is there a place where I can review the changes made to file system from 2.5.1 to 2.7.2?
The release notes. http://hadoop.apache.org/releases.html .You'd have to accumulate all the changes in the versions.
Practically, I'd try to run my application on your upgraded test cluster.
HTH
Ravi
On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <ka...@gmail.com> wrote:
Hi,
We have a hadoop production deployment with 1 name node and 10 data nodes which has more than 20TB of data in HDFS. We are currently using Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
I followed the following link (https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html) and updated a single node system running in pseudo distributed mode and it went without any issues. But this system did not have that much data as the production system.
Since this is a production system, I'm reluctant to do this update. I would like to see what other people have done in these cases and their experiences... Here are few questions I have..
- When we upgrade, does it change the namenode data structures and data nodes? I assume it only changes the name node...
- What are the risks with this upgrade ?
- Is there a place where I can review the changes made to file system from 2.5.1 to 2.7.2?
I would really appreciate if you can share your experiences.
Thanks in advance,Chathuri
Re: Upgrading production
Posted by Musty Rehmani <mu...@yahoo.com.INVALID>.
Keep the meta data backup before upgrade. Preferably on local machine. Do not finalize upgrade until you are OK with data availability
Musty
Sent from Yahoo Mail on Android
On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash<ra...@gmail.com> wrote: Hi Chathuri!
Technically there is a rollback option during upgrade. I don't know how well it has been tested, but the idea is that old metadata is not deleted until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm fairly confident that the HDFS upgrade will work smoothly. We have upgraded quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never having to roll back). Its your applications that work on top of HDFS and YARN that I'd be concerned about.
HTH
Ravi
On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <ka...@gmail.com> wrote:
Thanks for information Ravi. Is there a way that I can back up data before the update ? I was thinking about this approach..
Copy the current hadoop directories to a new set of directories.Point hadoop to this new setStart the migration with the backup set
Please let me know if people have done this upgrade successfully. I believe many things can go wrong in a lengthy upgrade like this. The data in the cluster is very important. Thanks,Chathuri
On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com> wrote:
Hi Chathuri!
- When we upgrade, does it change the namenode data structures and data nodes? I assume it only changes the name node...
It changes the NN as well as DN layout. As a matter of fact, this upgrade will take a long time on Datanodes as well because of https://issues.apache.org/jira/browse/HDFS-6482
- What are the risks with this upgrade ?
What Hadoop applications do you run on top of your cluster? The hope is that everything continues working smoothly for the most part, but inevitably some backward incompatible changes creep in.
- Is there a place where I can review the changes made to file system from 2.5.1 to 2.7.2?
The release notes. http://hadoop.apache.org/releases.html .You'd have to accumulate all the changes in the versions.
Practically, I'd try to run my application on your upgraded test cluster.
HTH
Ravi
On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <ka...@gmail.com> wrote:
Hi,
We have a hadoop production deployment with 1 name node and 10 data nodes which has more than 20TB of data in HDFS. We are currently using Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
I followed the following link (https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html) and updated a single node system running in pseudo distributed mode and it went without any issues. But this system did not have that much data as the production system.
Since this is a production system, I'm reluctant to do this update. I would like to see what other people have done in these cases and their experiences... Here are few questions I have..
- When we upgrade, does it change the namenode data structures and data nodes? I assume it only changes the name node...
- What are the risks with this upgrade ?
- Is there a place where I can review the changes made to file system from 2.5.1 to 2.7.2?
I would really appreciate if you can share your experiences.
Thanks in advance,Chathuri
Re: Upgrading production
Posted by Musty Rehmani <mu...@yahoo.com.INVALID>.
Keep the meta data backup before upgrade. Preferably on local machine. Do not finalize upgrade until you are OK with data availability
Musty
Sent from Yahoo Mail on Android
On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash<ra...@gmail.com> wrote: Hi Chathuri!
Technically there is a rollback option during upgrade. I don't know how well it has been tested, but the idea is that old metadata is not deleted until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm fairly confident that the HDFS upgrade will work smoothly. We have upgraded quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never having to roll back). Its your applications that work on top of HDFS and YARN that I'd be concerned about.
HTH
Ravi
On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <ka...@gmail.com> wrote:
Thanks for information Ravi. Is there a way that I can back up data before the update ? I was thinking about this approach..
Copy the current hadoop directories to a new set of directories.Point hadoop to this new setStart the migration with the backup set
Please let me know if people have done this upgrade successfully. I believe many things can go wrong in a lengthy upgrade like this. The data in the cluster is very important. Thanks,Chathuri
On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com> wrote:
Hi Chathuri!
- When we upgrade, does it change the namenode data structures and data nodes? I assume it only changes the name node...
It changes the NN as well as DN layout. As a matter of fact, this upgrade will take a long time on Datanodes as well because of https://issues.apache.org/jira/browse/HDFS-6482
- What are the risks with this upgrade ?
What Hadoop applications do you run on top of your cluster? The hope is that everything continues working smoothly for the most part, but inevitably some backward incompatible changes creep in.
- Is there a place where I can review the changes made to file system from 2.5.1 to 2.7.2?
The release notes. http://hadoop.apache.org/releases.html .You'd have to accumulate all the changes in the versions.
Practically, I'd try to run my application on your upgraded test cluster.
HTH
Ravi
On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <ka...@gmail.com> wrote:
Hi,
We have a hadoop production deployment with 1 name node and 10 data nodes which has more than 20TB of data in HDFS. We are currently using Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
I followed the following link (https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html) and updated a single node system running in pseudo distributed mode and it went without any issues. But this system did not have that much data as the production system.
Since this is a production system, I'm reluctant to do this update. I would like to see what other people have done in these cases and their experiences... Here are few questions I have..
- When we upgrade, does it change the namenode data structures and data nodes? I assume it only changes the name node...
- What are the risks with this upgrade ?
- Is there a place where I can review the changes made to file system from 2.5.1 to 2.7.2?
I would really appreciate if you can share your experiences.
Thanks in advance,Chathuri
Re: Upgrading production
Posted by Musty Rehmani <mu...@yahoo.com.INVALID>.
Keep the meta data backup before upgrade. Preferably on local machine. Do not finalize upgrade until you are OK with data availability
Musty
Sent from Yahoo Mail on Android
On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash<ra...@gmail.com> wrote: Hi Chathuri!
Technically there is a rollback option during upgrade. I don't know how well it has been tested, but the idea is that old metadata is not deleted until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm fairly confident that the HDFS upgrade will work smoothly. We have upgraded quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never having to roll back). Its your applications that work on top of HDFS and YARN that I'd be concerned about.
HTH
Ravi
On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <ka...@gmail.com> wrote:
Thanks for information Ravi. Is there a way that I can back up data before the update ? I was thinking about this approach..
Copy the current hadoop directories to a new set of directories.Point hadoop to this new setStart the migration with the backup set
Please let me know if people have done this upgrade successfully. I believe many things can go wrong in a lengthy upgrade like this. The data in the cluster is very important. Thanks,Chathuri
On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com> wrote:
Hi Chathuri!
- When we upgrade, does it change the namenode data structures and data nodes? I assume it only changes the name node...
It changes the NN as well as DN layout. As a matter of fact, this upgrade will take a long time on Datanodes as well because of https://issues.apache.org/jira/browse/HDFS-6482
- What are the risks with this upgrade ?
What Hadoop applications do you run on top of your cluster? The hope is that everything continues working smoothly for the most part, but inevitably some backward incompatible changes creep in.
- Is there a place where I can review the changes made to file system from 2.5.1 to 2.7.2?
The release notes. http://hadoop.apache.org/releases.html .You'd have to accumulate all the changes in the versions.
Practically, I'd try to run my application on your upgraded test cluster.
HTH
Ravi
On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <ka...@gmail.com> wrote:
Hi,
We have a hadoop production deployment with 1 name node and 10 data nodes which has more than 20TB of data in HDFS. We are currently using Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
I followed the following link (https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html) and updated a single node system running in pseudo distributed mode and it went without any issues. But this system did not have that much data as the production system.
Since this is a production system, I'm reluctant to do this update. I would like to see what other people have done in these cases and their experiences... Here are few questions I have..
- When we upgrade, does it change the namenode data structures and data nodes? I assume it only changes the name node...
- What are the risks with this upgrade ?
- Is there a place where I can review the changes made to file system from 2.5.1 to 2.7.2?
I would really appreciate if you can share your experiences.
Thanks in advance,Chathuri
unsubscribe
Posted by Marco Reis <ma...@marcoreis.net>.
On Thu, Mar 24, 2016 at 5:16 AM Chathuri Wimalasena <ka...@gmail.com>
wrote:
> Hi Ravi,
>
> Thank you for all the information, Our application is indexing twitter
> data to HBase and then do some data analytics on top of that. That's why
> HDFS data is very important to us. We cannot tolerate any data loss with
> the update. Do you remember how long it took for you to upgrade it from
> 2.4.1 to 2.7.1 ?
>
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> Technically there is a rollback option during upgrade. I don't know how
>> well it has been tested, but the idea is that old metadata is not deleted
>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>> having to roll back). Its your applications that work on top of HDFS and
>> YARN that I'd be concerned about.
>>
>> HTH
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Thanks for information Ravi. Is there a way that I can back up data
>>> before the update ? I was thinking about this approach..
>>>
>>> Copy the current hadoop directories to a new set of directories.
>>> Point hadoop to this new set
>>> Start the migration with the backup set
>>>
>>> Please let me know if people have done this upgrade successfully. I
>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>> in the cluster is very important.
>>> Thanks,
>>> Chathuri
>>>
>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi Chathuri!
>>>>
>>>> - When we upgrade, does it change the namenode data structures and
>>>> data nodes? I assume it only changes the name node...
>>>>
>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>> upgrade will take a long time on Datanodes as well because of
>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>
>>>> - What are the risks with this upgrade ?
>>>>
>>>> What Hadoop applications do you run on top of your cluster? The hope is
>>>> that everything continues working smoothly for the most part, but
>>>> inevitably some backward incompatible changes creep in.
>>>>
>>>> - Is there a place where I can review the changes made to file
>>>> system from 2.5.1 to 2.7.2?
>>>>
>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>> to accumulate all the changes in the versions.
>>>>
>>>> Practically, I'd try to run my application on your upgraded test
>>>> cluster.
>>>>
>>>> HTH
>>>>
>>>> Ravi
>>>>
>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>> kamalasini@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>
>>>>> I followed the following link (
>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>> went without any issues. But this system did not have that much data as the
>>>>> production system.
>>>>>
>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>> would like to see what other people have done in these cases and their
>>>>> experiences... Here are few questions I have..
>>>>>
>>>>> - When we upgrade, does it change the namenode data structures and
>>>>> data nodes? I assume it only changes the name node...
>>>>> - What are the risks with this upgrade ?
>>>>> - Is there a place where I can review the changes made to file
>>>>> system from 2.5.1 to 2.7.2?
>>>>>
>>>>> I would really appreciate if you can share your experiences.
>>>>>
>>>>> Thanks in advance,
>>>>> Chathuri
>>>>>
>>>>
>>>>
>>>
>>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Chathuri Wimalasena <ka...@gmail.com>.
Hi Ravi,
We have 10 data nodes. Each data node has 12 disks mounted and each data
node contains nearly 20 TB.
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md0 113183272 12830044 94597128 12% /
tmpfs 66061772 0 66061772 0% /dev/shm
/dev/sdc1 3905108984 1847318400 2057790584 48% /data/sda
/dev/sdd1 3905108984 1766808072 2138300912 46% /data/sdb
/dev/sde1 3905108984 1762628972 2142480012 46% /data/sdc
/dev/sdf1 3905108984 1762803256 2142305728 46% /data/sdd
/dev/sdg1 3905108984 1757301724 2147807260 46% /data/sde
/dev/sdh1 3905108984 1764210768 2140898216 46% /data/sdf
/dev/sdi1 3905108984 1754803788 2150305196 45% /data/sdg
/dev/sdj1 3905108984 1753740904 2151368080 45% /data/sdh
/dev/sdk1 3905108984 1758186416 2146922568 46% /data/sdi
/dev/sdl1 3905108984 1757352332 2147756652 46% /data/sdj
/dev/sdm1 3905108984 1759121952 2145987032 46% /data/sdk
/dev/sdn1 3905108984 2991279120 913829864 77% /data/sdl
10.10.2.54:/home 113183744 55836672 51591168 52% /home
10.10.2.54:/vol/home1
976283648 93448192 882835456 10% /vol/home1
10.10.2.54:/vol/home2
976284672 412706816 563577856 43% /vol/home2
10.10.2.54:/vol/home3
976284672 51256320 925028352 6% /vol/home3
Thanks,
Chathuri
On Thu, Mar 24, 2016 at 2:45 PM, Ravi Prakash <ra...@gmail.com> wrote:
> Hi Chathuri!
>
> You're welcome! We did not have an HBase instance to upgrade. It depends
> on how many blocks your datanodes are storing (== how big your disks are *
> how many disks you have * how full your disks are). What are those numbers
> for you? We experienced anywhere from 1-3 hours for the upgrade.
>
> HTH
> Ravi
>
> On Thu, Mar 24, 2016 at 1:16 AM, Chathuri Wimalasena <kamalasini@gmail.com
> > wrote:
>
>> Hi Ravi,
>>
>> Thank you for all the information, Our application is indexing twitter
>> data to HBase and then do some data analytics on top of that. That's why
>> HDFS data is very important to us. We cannot tolerate any data loss with
>> the update. Do you remember how long it took for you to upgrade it from
>> 2.4.1 to 2.7.1 ?
>>
>> Thanks,
>> Chathuri
>>
>> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
>> wrote:
>>
>>> Hi Chathuri!
>>>
>>> Technically there is a rollback option during upgrade. I don't know how
>>> well it has been tested, but the idea is that old metadata is not deleted
>>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>>> having to roll back). Its your applications that work on top of HDFS and
>>> YARN that I'd be concerned about.
>>>
>>> HTH
>>> Ravi
>>>
>>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>>> kamalasini@gmail.com> wrote:
>>>
>>>> Thanks for information Ravi. Is there a way that I can back up data
>>>> before the update ? I was thinking about this approach..
>>>>
>>>> Copy the current hadoop directories to a new set of directories.
>>>> Point hadoop to this new set
>>>> Start the migration with the backup set
>>>>
>>>> Please let me know if people have done this upgrade successfully. I
>>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>>> in the cluster is very important.
>>>> Thanks,
>>>> Chathuri
>>>>
>>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Chathuri!
>>>>>
>>>>> - When we upgrade, does it change the namenode data structures and
>>>>> data nodes? I assume it only changes the name node...
>>>>>
>>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>>> upgrade will take a long time on Datanodes as well because of
>>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>>
>>>>> - What are the risks with this upgrade ?
>>>>>
>>>>> What Hadoop applications do you run on top of your cluster? The hope
>>>>> is that everything continues working smoothly for the most part, but
>>>>> inevitably some backward incompatible changes creep in.
>>>>>
>>>>> - Is there a place where I can review the changes made to file
>>>>> system from 2.5.1 to 2.7.2?
>>>>>
>>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>>> to accumulate all the changes in the versions.
>>>>>
>>>>> Practically, I'd try to run my application on your upgraded test
>>>>> cluster.
>>>>>
>>>>> HTH
>>>>>
>>>>> Ravi
>>>>>
>>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>>> kamalasini@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>>
>>>>>> I followed the following link (
>>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>>> went without any issues. But this system did not have that much data as the
>>>>>> production system.
>>>>>>
>>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>>> would like to see what other people have done in these cases and their
>>>>>> experiences... Here are few questions I have..
>>>>>>
>>>>>> - When we upgrade, does it change the namenode data structures
>>>>>> and data nodes? I assume it only changes the name node...
>>>>>> - What are the risks with this upgrade ?
>>>>>> - Is there a place where I can review the changes made to file
>>>>>> system from 2.5.1 to 2.7.2?
>>>>>>
>>>>>> I would really appreciate if you can share your experiences.
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Chathuri
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Chathuri Wimalasena <ka...@gmail.com>.
Hi Ravi,
We have 10 data nodes. Each data node has 12 disks mounted and each data
node contains nearly 20 TB.
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md0 113183272 12830044 94597128 12% /
tmpfs 66061772 0 66061772 0% /dev/shm
/dev/sdc1 3905108984 1847318400 2057790584 48% /data/sda
/dev/sdd1 3905108984 1766808072 2138300912 46% /data/sdb
/dev/sde1 3905108984 1762628972 2142480012 46% /data/sdc
/dev/sdf1 3905108984 1762803256 2142305728 46% /data/sdd
/dev/sdg1 3905108984 1757301724 2147807260 46% /data/sde
/dev/sdh1 3905108984 1764210768 2140898216 46% /data/sdf
/dev/sdi1 3905108984 1754803788 2150305196 45% /data/sdg
/dev/sdj1 3905108984 1753740904 2151368080 45% /data/sdh
/dev/sdk1 3905108984 1758186416 2146922568 46% /data/sdi
/dev/sdl1 3905108984 1757352332 2147756652 46% /data/sdj
/dev/sdm1 3905108984 1759121952 2145987032 46% /data/sdk
/dev/sdn1 3905108984 2991279120 913829864 77% /data/sdl
10.10.2.54:/home 113183744 55836672 51591168 52% /home
10.10.2.54:/vol/home1
976283648 93448192 882835456 10% /vol/home1
10.10.2.54:/vol/home2
976284672 412706816 563577856 43% /vol/home2
10.10.2.54:/vol/home3
976284672 51256320 925028352 6% /vol/home3
Thanks,
Chathuri
On Thu, Mar 24, 2016 at 2:45 PM, Ravi Prakash <ra...@gmail.com> wrote:
> Hi Chathuri!
>
> You're welcome! We did not have an HBase instance to upgrade. It depends
> on how many blocks your datanodes are storing (== how big your disks are *
> how many disks you have * how full your disks are). What are those numbers
> for you? We experienced anywhere from 1-3 hours for the upgrade.
>
> HTH
> Ravi
>
> On Thu, Mar 24, 2016 at 1:16 AM, Chathuri Wimalasena <kamalasini@gmail.com
> > wrote:
>
>> Hi Ravi,
>>
>> Thank you for all the information, Our application is indexing twitter
>> data to HBase and then do some data analytics on top of that. That's why
>> HDFS data is very important to us. We cannot tolerate any data loss with
>> the update. Do you remember how long it took for you to upgrade it from
>> 2.4.1 to 2.7.1 ?
>>
>> Thanks,
>> Chathuri
>>
>> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
>> wrote:
>>
>>> Hi Chathuri!
>>>
>>> Technically there is a rollback option during upgrade. I don't know how
>>> well it has been tested, but the idea is that old metadata is not deleted
>>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>>> having to roll back). Its your applications that work on top of HDFS and
>>> YARN that I'd be concerned about.
>>>
>>> HTH
>>> Ravi
>>>
>>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>>> kamalasini@gmail.com> wrote:
>>>
>>>> Thanks for information Ravi. Is there a way that I can back up data
>>>> before the update ? I was thinking about this approach..
>>>>
>>>> Copy the current hadoop directories to a new set of directories.
>>>> Point hadoop to this new set
>>>> Start the migration with the backup set
>>>>
>>>> Please let me know if people have done this upgrade successfully. I
>>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>>> in the cluster is very important.
>>>> Thanks,
>>>> Chathuri
>>>>
>>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Chathuri!
>>>>>
>>>>> - When we upgrade, does it change the namenode data structures and
>>>>> data nodes? I assume it only changes the name node...
>>>>>
>>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>>> upgrade will take a long time on Datanodes as well because of
>>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>>
>>>>> - What are the risks with this upgrade ?
>>>>>
>>>>> What Hadoop applications do you run on top of your cluster? The hope
>>>>> is that everything continues working smoothly for the most part, but
>>>>> inevitably some backward incompatible changes creep in.
>>>>>
>>>>> - Is there a place where I can review the changes made to file
>>>>> system from 2.5.1 to 2.7.2?
>>>>>
>>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>>> to accumulate all the changes in the versions.
>>>>>
>>>>> Practically, I'd try to run my application on your upgraded test
>>>>> cluster.
>>>>>
>>>>> HTH
>>>>>
>>>>> Ravi
>>>>>
>>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>>> kamalasini@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>>
>>>>>> I followed the following link (
>>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>>> went without any issues. But this system did not have that much data as the
>>>>>> production system.
>>>>>>
>>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>>> would like to see what other people have done in these cases and their
>>>>>> experiences... Here are few questions I have..
>>>>>>
>>>>>> - When we upgrade, does it change the namenode data structures
>>>>>> and data nodes? I assume it only changes the name node...
>>>>>> - What are the risks with this upgrade ?
>>>>>> - Is there a place where I can review the changes made to file
>>>>>> system from 2.5.1 to 2.7.2?
>>>>>>
>>>>>> I would really appreciate if you can share your experiences.
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Chathuri
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Chathuri Wimalasena <ka...@gmail.com>.
Hi Ravi,
We have 10 data nodes. Each data node has 12 disks mounted and each data
node contains nearly 20 TB.
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md0 113183272 12830044 94597128 12% /
tmpfs 66061772 0 66061772 0% /dev/shm
/dev/sdc1 3905108984 1847318400 2057790584 48% /data/sda
/dev/sdd1 3905108984 1766808072 2138300912 46% /data/sdb
/dev/sde1 3905108984 1762628972 2142480012 46% /data/sdc
/dev/sdf1 3905108984 1762803256 2142305728 46% /data/sdd
/dev/sdg1 3905108984 1757301724 2147807260 46% /data/sde
/dev/sdh1 3905108984 1764210768 2140898216 46% /data/sdf
/dev/sdi1 3905108984 1754803788 2150305196 45% /data/sdg
/dev/sdj1 3905108984 1753740904 2151368080 45% /data/sdh
/dev/sdk1 3905108984 1758186416 2146922568 46% /data/sdi
/dev/sdl1 3905108984 1757352332 2147756652 46% /data/sdj
/dev/sdm1 3905108984 1759121952 2145987032 46% /data/sdk
/dev/sdn1 3905108984 2991279120 913829864 77% /data/sdl
10.10.2.54:/home 113183744 55836672 51591168 52% /home
10.10.2.54:/vol/home1
976283648 93448192 882835456 10% /vol/home1
10.10.2.54:/vol/home2
976284672 412706816 563577856 43% /vol/home2
10.10.2.54:/vol/home3
976284672 51256320 925028352 6% /vol/home3
Thanks,
Chathuri
On Thu, Mar 24, 2016 at 2:45 PM, Ravi Prakash <ra...@gmail.com> wrote:
> Hi Chathuri!
>
> You're welcome! We did not have an HBase instance to upgrade. It depends
> on how many blocks your datanodes are storing (== how big your disks are *
> how many disks you have * how full your disks are). What are those numbers
> for you? We experienced anywhere from 1-3 hours for the upgrade.
>
> HTH
> Ravi
>
> On Thu, Mar 24, 2016 at 1:16 AM, Chathuri Wimalasena <kamalasini@gmail.com
> > wrote:
>
>> Hi Ravi,
>>
>> Thank you for all the information, Our application is indexing twitter
>> data to HBase and then do some data analytics on top of that. That's why
>> HDFS data is very important to us. We cannot tolerate any data loss with
>> the update. Do you remember how long it took for you to upgrade it from
>> 2.4.1 to 2.7.1 ?
>>
>> Thanks,
>> Chathuri
>>
>> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
>> wrote:
>>
>>> Hi Chathuri!
>>>
>>> Technically there is a rollback option during upgrade. I don't know how
>>> well it has been tested, but the idea is that old metadata is not deleted
>>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>>> having to roll back). Its your applications that work on top of HDFS and
>>> YARN that I'd be concerned about.
>>>
>>> HTH
>>> Ravi
>>>
>>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>>> kamalasini@gmail.com> wrote:
>>>
>>>> Thanks for information Ravi. Is there a way that I can back up data
>>>> before the update ? I was thinking about this approach..
>>>>
>>>> Copy the current hadoop directories to a new set of directories.
>>>> Point hadoop to this new set
>>>> Start the migration with the backup set
>>>>
>>>> Please let me know if people have done this upgrade successfully. I
>>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>>> in the cluster is very important.
>>>> Thanks,
>>>> Chathuri
>>>>
>>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Chathuri!
>>>>>
>>>>> - When we upgrade, does it change the namenode data structures and
>>>>> data nodes? I assume it only changes the name node...
>>>>>
>>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>>> upgrade will take a long time on Datanodes as well because of
>>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>>
>>>>> - What are the risks with this upgrade ?
>>>>>
>>>>> What Hadoop applications do you run on top of your cluster? The hope
>>>>> is that everything continues working smoothly for the most part, but
>>>>> inevitably some backward incompatible changes creep in.
>>>>>
>>>>> - Is there a place where I can review the changes made to file
>>>>> system from 2.5.1 to 2.7.2?
>>>>>
>>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>>> to accumulate all the changes in the versions.
>>>>>
>>>>> Practically, I'd try to run my application on your upgraded test
>>>>> cluster.
>>>>>
>>>>> HTH
>>>>>
>>>>> Ravi
>>>>>
>>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>>> kamalasini@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>>
>>>>>> I followed the following link (
>>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>>> went without any issues. But this system did not have that much data as the
>>>>>> production system.
>>>>>>
>>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>>> would like to see what other people have done in these cases and their
>>>>>> experiences... Here are few questions I have..
>>>>>>
>>>>>> - When we upgrade, does it change the namenode data structures
>>>>>> and data nodes? I assume it only changes the name node...
>>>>>> - What are the risks with this upgrade ?
>>>>>> - Is there a place where I can review the changes made to file
>>>>>> system from 2.5.1 to 2.7.2?
>>>>>>
>>>>>> I would really appreciate if you can share your experiences.
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Chathuri
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Chathuri Wimalasena <ka...@gmail.com>.
Hi Ravi,
We have 10 data nodes. Each data node has 12 disks mounted and each data
node contains nearly 20 TB.
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md0 113183272 12830044 94597128 12% /
tmpfs 66061772 0 66061772 0% /dev/shm
/dev/sdc1 3905108984 1847318400 2057790584 48% /data/sda
/dev/sdd1 3905108984 1766808072 2138300912 46% /data/sdb
/dev/sde1 3905108984 1762628972 2142480012 46% /data/sdc
/dev/sdf1 3905108984 1762803256 2142305728 46% /data/sdd
/dev/sdg1 3905108984 1757301724 2147807260 46% /data/sde
/dev/sdh1 3905108984 1764210768 2140898216 46% /data/sdf
/dev/sdi1 3905108984 1754803788 2150305196 45% /data/sdg
/dev/sdj1 3905108984 1753740904 2151368080 45% /data/sdh
/dev/sdk1 3905108984 1758186416 2146922568 46% /data/sdi
/dev/sdl1 3905108984 1757352332 2147756652 46% /data/sdj
/dev/sdm1 3905108984 1759121952 2145987032 46% /data/sdk
/dev/sdn1 3905108984 2991279120 913829864 77% /data/sdl
10.10.2.54:/home 113183744 55836672 51591168 52% /home
10.10.2.54:/vol/home1
976283648 93448192 882835456 10% /vol/home1
10.10.2.54:/vol/home2
976284672 412706816 563577856 43% /vol/home2
10.10.2.54:/vol/home3
976284672 51256320 925028352 6% /vol/home3
Thanks,
Chathuri
On Thu, Mar 24, 2016 at 2:45 PM, Ravi Prakash <ra...@gmail.com> wrote:
> Hi Chathuri!
>
> You're welcome! We did not have an HBase instance to upgrade. It depends
> on how many blocks your datanodes are storing (== how big your disks are *
> how many disks you have * how full your disks are). What are those numbers
> for you? We experienced anywhere from 1-3 hours for the upgrade.
>
> HTH
> Ravi
>
> On Thu, Mar 24, 2016 at 1:16 AM, Chathuri Wimalasena <kamalasini@gmail.com
> > wrote:
>
>> Hi Ravi,
>>
>> Thank you for all the information, Our application is indexing twitter
>> data to HBase and then do some data analytics on top of that. That's why
>> HDFS data is very important to us. We cannot tolerate any data loss with
>> the update. Do you remember how long it took for you to upgrade it from
>> 2.4.1 to 2.7.1 ?
>>
>> Thanks,
>> Chathuri
>>
>> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
>> wrote:
>>
>>> Hi Chathuri!
>>>
>>> Technically there is a rollback option during upgrade. I don't know how
>>> well it has been tested, but the idea is that old metadata is not deleted
>>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>>> having to roll back). Its your applications that work on top of HDFS and
>>> YARN that I'd be concerned about.
>>>
>>> HTH
>>> Ravi
>>>
>>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>>> kamalasini@gmail.com> wrote:
>>>
>>>> Thanks for information Ravi. Is there a way that I can back up data
>>>> before the update ? I was thinking about this approach..
>>>>
>>>> Copy the current hadoop directories to a new set of directories.
>>>> Point hadoop to this new set
>>>> Start the migration with the backup set
>>>>
>>>> Please let me know if people have done this upgrade successfully. I
>>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>>> in the cluster is very important.
>>>> Thanks,
>>>> Chathuri
>>>>
>>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Chathuri!
>>>>>
>>>>> - When we upgrade, does it change the namenode data structures and
>>>>> data nodes? I assume it only changes the name node...
>>>>>
>>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>>> upgrade will take a long time on Datanodes as well because of
>>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>>
>>>>> - What are the risks with this upgrade ?
>>>>>
>>>>> What Hadoop applications do you run on top of your cluster? The hope
>>>>> is that everything continues working smoothly for the most part, but
>>>>> inevitably some backward incompatible changes creep in.
>>>>>
>>>>> - Is there a place where I can review the changes made to file
>>>>> system from 2.5.1 to 2.7.2?
>>>>>
>>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>>> to accumulate all the changes in the versions.
>>>>>
>>>>> Practically, I'd try to run my application on your upgraded test
>>>>> cluster.
>>>>>
>>>>> HTH
>>>>>
>>>>> Ravi
>>>>>
>>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>>> kamalasini@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>>
>>>>>> I followed the following link (
>>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>>> went without any issues. But this system did not have that much data as the
>>>>>> production system.
>>>>>>
>>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>>> would like to see what other people have done in these cases and their
>>>>>> experiences... Here are few questions I have..
>>>>>>
>>>>>> - When we upgrade, does it change the namenode data structures
>>>>>> and data nodes? I assume it only changes the name node...
>>>>>> - What are the risks with this upgrade ?
>>>>>> - Is there a place where I can review the changes made to file
>>>>>> system from 2.5.1 to 2.7.2?
>>>>>>
>>>>>> I would really appreciate if you can share your experiences.
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Chathuri
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!
You're welcome! We did not have an HBase instance to upgrade. It depends on
how many blocks your datanodes are storing (== how big your disks are * how
many disks you have * how full your disks are). What are those numbers for
you? We experienced anywhere from 1-3 hours for the upgrade.
HTH
Ravi
On Thu, Mar 24, 2016 at 1:16 AM, Chathuri Wimalasena <ka...@gmail.com>
wrote:
> Hi Ravi,
>
> Thank you for all the information, Our application is indexing twitter
> data to HBase and then do some data analytics on top of that. That's why
> HDFS data is very important to us. We cannot tolerate any data loss with
> the update. Do you remember how long it took for you to upgrade it from
> 2.4.1 to 2.7.1 ?
>
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> Technically there is a rollback option during upgrade. I don't know how
>> well it has been tested, but the idea is that old metadata is not deleted
>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>> having to roll back). Its your applications that work on top of HDFS and
>> YARN that I'd be concerned about.
>>
>> HTH
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Thanks for information Ravi. Is there a way that I can back up data
>>> before the update ? I was thinking about this approach..
>>>
>>> Copy the current hadoop directories to a new set of directories.
>>> Point hadoop to this new set
>>> Start the migration with the backup set
>>>
>>> Please let me know if people have done this upgrade successfully. I
>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>> in the cluster is very important.
>>> Thanks,
>>> Chathuri
>>>
>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi Chathuri!
>>>>
>>>> - When we upgrade, does it change the namenode data structures and
>>>> data nodes? I assume it only changes the name node...
>>>>
>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>> upgrade will take a long time on Datanodes as well because of
>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>
>>>> - What are the risks with this upgrade ?
>>>>
>>>> What Hadoop applications do you run on top of your cluster? The hope is
>>>> that everything continues working smoothly for the most part, but
>>>> inevitably some backward incompatible changes creep in.
>>>>
>>>> - Is there a place where I can review the changes made to file
>>>> system from 2.5.1 to 2.7.2?
>>>>
>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>> to accumulate all the changes in the versions.
>>>>
>>>> Practically, I'd try to run my application on your upgraded test
>>>> cluster.
>>>>
>>>> HTH
>>>>
>>>> Ravi
>>>>
>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>> kamalasini@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>
>>>>> I followed the following link (
>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>> went without any issues. But this system did not have that much data as the
>>>>> production system.
>>>>>
>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>> would like to see what other people have done in these cases and their
>>>>> experiences... Here are few questions I have..
>>>>>
>>>>> - When we upgrade, does it change the namenode data structures and
>>>>> data nodes? I assume it only changes the name node...
>>>>> - What are the risks with this upgrade ?
>>>>> - Is there a place where I can review the changes made to file
>>>>> system from 2.5.1 to 2.7.2?
>>>>>
>>>>> I would really appreciate if you can share your experiences.
>>>>>
>>>>> Thanks in advance,
>>>>> Chathuri
>>>>>
>>>>
>>>>
>>>
>>
>
unsubscribe
Posted by Marco Reis <ma...@marcoreis.net>.
On Thu, Mar 24, 2016 at 5:16 AM Chathuri Wimalasena <ka...@gmail.com>
wrote:
> Hi Ravi,
>
> Thank you for all the information, Our application is indexing twitter
> data to HBase and then do some data analytics on top of that. That's why
> HDFS data is very important to us. We cannot tolerate any data loss with
> the update. Do you remember how long it took for you to upgrade it from
> 2.4.1 to 2.7.1 ?
>
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> Technically there is a rollback option during upgrade. I don't know how
>> well it has been tested, but the idea is that old metadata is not deleted
>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>> having to roll back). Its your applications that work on top of HDFS and
>> YARN that I'd be concerned about.
>>
>> HTH
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Thanks for information Ravi. Is there a way that I can back up data
>>> before the update ? I was thinking about this approach..
>>>
>>> Copy the current hadoop directories to a new set of directories.
>>> Point hadoop to this new set
>>> Start the migration with the backup set
>>>
>>> Please let me know if people have done this upgrade successfully. I
>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>> in the cluster is very important.
>>> Thanks,
>>> Chathuri
>>>
>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi Chathuri!
>>>>
>>>> - When we upgrade, does it change the namenode data structures and
>>>> data nodes? I assume it only changes the name node...
>>>>
>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>> upgrade will take a long time on Datanodes as well because of
>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>
>>>> - What are the risks with this upgrade ?
>>>>
>>>> What Hadoop applications do you run on top of your cluster? The hope is
>>>> that everything continues working smoothly for the most part, but
>>>> inevitably some backward incompatible changes creep in.
>>>>
>>>> - Is there a place where I can review the changes made to file
>>>> system from 2.5.1 to 2.7.2?
>>>>
>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>> to accumulate all the changes in the versions.
>>>>
>>>> Practically, I'd try to run my application on your upgraded test
>>>> cluster.
>>>>
>>>> HTH
>>>>
>>>> Ravi
>>>>
>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>> kamalasini@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>
>>>>> I followed the following link (
>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>> went without any issues. But this system did not have that much data as the
>>>>> production system.
>>>>>
>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>> would like to see what other people have done in these cases and their
>>>>> experiences... Here are few questions I have..
>>>>>
>>>>> - When we upgrade, does it change the namenode data structures and
>>>>> data nodes? I assume it only changes the name node...
>>>>> - What are the risks with this upgrade ?
>>>>> - Is there a place where I can review the changes made to file
>>>>> system from 2.5.1 to 2.7.2?
>>>>>
>>>>> I would really appreciate if you can share your experiences.
>>>>>
>>>>> Thanks in advance,
>>>>> Chathuri
>>>>>
>>>>
>>>>
>>>
>>
>
unsubscribe
Posted by Marco Reis <ma...@marcoreis.net>.
On Thu, Mar 24, 2016 at 5:16 AM Chathuri Wimalasena <ka...@gmail.com>
wrote:
> Hi Ravi,
>
> Thank you for all the information, Our application is indexing twitter
> data to HBase and then do some data analytics on top of that. That's why
> HDFS data is very important to us. We cannot tolerate any data loss with
> the update. Do you remember how long it took for you to upgrade it from
> 2.4.1 to 2.7.1 ?
>
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> Technically there is a rollback option during upgrade. I don't know how
>> well it has been tested, but the idea is that old metadata is not deleted
>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>> having to roll back). Its your applications that work on top of HDFS and
>> YARN that I'd be concerned about.
>>
>> HTH
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Thanks for information Ravi. Is there a way that I can back up data
>>> before the update ? I was thinking about this approach..
>>>
>>> Copy the current hadoop directories to a new set of directories.
>>> Point hadoop to this new set
>>> Start the migration with the backup set
>>>
>>> Please let me know if people have done this upgrade successfully. I
>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>> in the cluster is very important.
>>> Thanks,
>>> Chathuri
>>>
>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi Chathuri!
>>>>
>>>> - When we upgrade, does it change the namenode data structures and
>>>> data nodes? I assume it only changes the name node...
>>>>
>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>> upgrade will take a long time on Datanodes as well because of
>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>
>>>> - What are the risks with this upgrade ?
>>>>
>>>> What Hadoop applications do you run on top of your cluster? The hope is
>>>> that everything continues working smoothly for the most part, but
>>>> inevitably some backward incompatible changes creep in.
>>>>
>>>> - Is there a place where I can review the changes made to file
>>>> system from 2.5.1 to 2.7.2?
>>>>
>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>> to accumulate all the changes in the versions.
>>>>
>>>> Practically, I'd try to run my application on your upgraded test
>>>> cluster.
>>>>
>>>> HTH
>>>>
>>>> Ravi
>>>>
>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>> kamalasini@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>
>>>>> I followed the following link (
>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>> went without any issues. But this system did not have that much data as the
>>>>> production system.
>>>>>
>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>> would like to see what other people have done in these cases and their
>>>>> experiences... Here are few questions I have..
>>>>>
>>>>> - When we upgrade, does it change the namenode data structures and
>>>>> data nodes? I assume it only changes the name node...
>>>>> - What are the risks with this upgrade ?
>>>>> - Is there a place where I can review the changes made to file
>>>>> system from 2.5.1 to 2.7.2?
>>>>>
>>>>> I would really appreciate if you can share your experiences.
>>>>>
>>>>> Thanks in advance,
>>>>> Chathuri
>>>>>
>>>>
>>>>
>>>
>>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!
You're welcome! We did not have an HBase instance to upgrade. It depends on
how many blocks your datanodes are storing (== how big your disks are * how
many disks you have * how full your disks are). What are those numbers for
you? We experienced anywhere from 1-3 hours for the upgrade.
HTH
Ravi
On Thu, Mar 24, 2016 at 1:16 AM, Chathuri Wimalasena <ka...@gmail.com>
wrote:
> Hi Ravi,
>
> Thank you for all the information, Our application is indexing twitter
> data to HBase and then do some data analytics on top of that. That's why
> HDFS data is very important to us. We cannot tolerate any data loss with
> the update. Do you remember how long it took for you to upgrade it from
> 2.4.1 to 2.7.1 ?
>
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> Technically there is a rollback option during upgrade. I don't know how
>> well it has been tested, but the idea is that old metadata is not deleted
>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>> having to roll back). Its your applications that work on top of HDFS and
>> YARN that I'd be concerned about.
>>
>> HTH
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Thanks for information Ravi. Is there a way that I can back up data
>>> before the update ? I was thinking about this approach..
>>>
>>> Copy the current hadoop directories to a new set of directories.
>>> Point hadoop to this new set
>>> Start the migration with the backup set
>>>
>>> Please let me know if people have done this upgrade successfully. I
>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>> in the cluster is very important.
>>> Thanks,
>>> Chathuri
>>>
>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi Chathuri!
>>>>
>>>> - When we upgrade, does it change the namenode data structures and
>>>> data nodes? I assume it only changes the name node...
>>>>
>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>> upgrade will take a long time on Datanodes as well because of
>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>
>>>> - What are the risks with this upgrade ?
>>>>
>>>> What Hadoop applications do you run on top of your cluster? The hope is
>>>> that everything continues working smoothly for the most part, but
>>>> inevitably some backward incompatible changes creep in.
>>>>
>>>> - Is there a place where I can review the changes made to file
>>>> system from 2.5.1 to 2.7.2?
>>>>
>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>> to accumulate all the changes in the versions.
>>>>
>>>> Practically, I'd try to run my application on your upgraded test
>>>> cluster.
>>>>
>>>> HTH
>>>>
>>>> Ravi
>>>>
>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>> kamalasini@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>
>>>>> I followed the following link (
>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>> went without any issues. But this system did not have that much data as the
>>>>> production system.
>>>>>
>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>> would like to see what other people have done in these cases and their
>>>>> experiences... Here are few questions I have..
>>>>>
>>>>> - When we upgrade, does it change the namenode data structures and
>>>>> data nodes? I assume it only changes the name node...
>>>>> - What are the risks with this upgrade ?
>>>>> - Is there a place where I can review the changes made to file
>>>>> system from 2.5.1 to 2.7.2?
>>>>>
>>>>> I would really appreciate if you can share your experiences.
>>>>>
>>>>> Thanks in advance,
>>>>> Chathuri
>>>>>
>>>>
>>>>
>>>
>>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!
You're welcome! We did not have an HBase instance to upgrade. It depends on
how many blocks your datanodes are storing (== how big your disks are * how
many disks you have * how full your disks are). What are those numbers for
you? We experienced anywhere from 1-3 hours for the upgrade.
HTH
Ravi
On Thu, Mar 24, 2016 at 1:16 AM, Chathuri Wimalasena <ka...@gmail.com>
wrote:
> Hi Ravi,
>
> Thank you for all the information, Our application is indexing twitter
> data to HBase and then do some data analytics on top of that. That's why
> HDFS data is very important to us. We cannot tolerate any data loss with
> the update. Do you remember how long it took for you to upgrade it from
> 2.4.1 to 2.7.1 ?
>
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> Technically there is a rollback option during upgrade. I don't know how
>> well it has been tested, but the idea is that old metadata is not deleted
>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>> having to roll back). Its your applications that work on top of HDFS and
>> YARN that I'd be concerned about.
>>
>> HTH
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Thanks for information Ravi. Is there a way that I can back up data
>>> before the update ? I was thinking about this approach..
>>>
>>> Copy the current hadoop directories to a new set of directories.
>>> Point hadoop to this new set
>>> Start the migration with the backup set
>>>
>>> Please let me know if people have done this upgrade successfully. I
>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>> in the cluster is very important.
>>> Thanks,
>>> Chathuri
>>>
>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi Chathuri!
>>>>
>>>> - When we upgrade, does it change the namenode data structures and
>>>> data nodes? I assume it only changes the name node...
>>>>
>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>> upgrade will take a long time on Datanodes as well because of
>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>
>>>> - What are the risks with this upgrade ?
>>>>
>>>> What Hadoop applications do you run on top of your cluster? The hope is
>>>> that everything continues working smoothly for the most part, but
>>>> inevitably some backward incompatible changes creep in.
>>>>
>>>> - Is there a place where I can review the changes made to file
>>>> system from 2.5.1 to 2.7.2?
>>>>
>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>> to accumulate all the changes in the versions.
>>>>
>>>> Practically, I'd try to run my application on your upgraded test
>>>> cluster.
>>>>
>>>> HTH
>>>>
>>>> Ravi
>>>>
>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>> kamalasini@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>
>>>>> I followed the following link (
>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>> went without any issues. But this system did not have that much data as the
>>>>> production system.
>>>>>
>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>> would like to see what other people have done in these cases and their
>>>>> experiences... Here are few questions I have..
>>>>>
>>>>> - When we upgrade, does it change the namenode data structures and
>>>>> data nodes? I assume it only changes the name node...
>>>>> - What are the risks with this upgrade ?
>>>>> - Is there a place where I can review the changes made to file
>>>>> system from 2.5.1 to 2.7.2?
>>>>>
>>>>> I would really appreciate if you can share your experiences.
>>>>>
>>>>> Thanks in advance,
>>>>> Chathuri
>>>>>
>>>>
>>>>
>>>
>>
>
unsubscribe
Posted by Marco Reis <ma...@marcoreis.net>.
On Thu, Mar 24, 2016 at 5:16 AM Chathuri Wimalasena <ka...@gmail.com>
wrote:
> Hi Ravi,
>
> Thank you for all the information, Our application is indexing twitter
> data to HBase and then do some data analytics on top of that. That's why
> HDFS data is very important to us. We cannot tolerate any data loss with
> the update. Do you remember how long it took for you to upgrade it from
> 2.4.1 to 2.7.1 ?
>
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> Technically there is a rollback option during upgrade. I don't know how
>> well it has been tested, but the idea is that old metadata is not deleted
>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>> having to roll back). Its your applications that work on top of HDFS and
>> YARN that I'd be concerned about.
>>
>> HTH
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Thanks for information Ravi. Is there a way that I can back up data
>>> before the update ? I was thinking about this approach..
>>>
>>> Copy the current hadoop directories to a new set of directories.
>>> Point hadoop to this new set
>>> Start the migration with the backup set
>>>
>>> Please let me know if people have done this upgrade successfully. I
>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>> in the cluster is very important.
>>> Thanks,
>>> Chathuri
>>>
>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi Chathuri!
>>>>
>>>> - When we upgrade, does it change the namenode data structures and
>>>> data nodes? I assume it only changes the name node...
>>>>
>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>> upgrade will take a long time on Datanodes as well because of
>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>
>>>> - What are the risks with this upgrade ?
>>>>
>>>> What Hadoop applications do you run on top of your cluster? The hope is
>>>> that everything continues working smoothly for the most part, but
>>>> inevitably some backward incompatible changes creep in.
>>>>
>>>> - Is there a place where I can review the changes made to file
>>>> system from 2.5.1 to 2.7.2?
>>>>
>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>> to accumulate all the changes in the versions.
>>>>
>>>> Practically, I'd try to run my application on your upgraded test
>>>> cluster.
>>>>
>>>> HTH
>>>>
>>>> Ravi
>>>>
>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>> kamalasini@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>
>>>>> I followed the following link (
>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>> went without any issues. But this system did not have that much data as the
>>>>> production system.
>>>>>
>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>> would like to see what other people have done in these cases and their
>>>>> experiences... Here are few questions I have..
>>>>>
>>>>> - When we upgrade, does it change the namenode data structures and
>>>>> data nodes? I assume it only changes the name node...
>>>>> - What are the risks with this upgrade ?
>>>>> - Is there a place where I can review the changes made to file
>>>>> system from 2.5.1 to 2.7.2?
>>>>>
>>>>> I would really appreciate if you can share your experiences.
>>>>>
>>>>> Thanks in advance,
>>>>> Chathuri
>>>>>
>>>>
>>>>
>>>
>>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!
You're welcome! We did not have an HBase instance to upgrade. It depends on
how many blocks your datanodes are storing (== how big your disks are * how
many disks you have * how full your disks are). What are those numbers for
you? We experienced anywhere from 1-3 hours for the upgrade.
HTH
Ravi
On Thu, Mar 24, 2016 at 1:16 AM, Chathuri Wimalasena <ka...@gmail.com>
wrote:
> Hi Ravi,
>
> Thank you for all the information, Our application is indexing twitter
> data to HBase and then do some data analytics on top of that. That's why
> HDFS data is very important to us. We cannot tolerate any data loss with
> the update. Do you remember how long it took for you to upgrade it from
> 2.4.1 to 2.7.1 ?
>
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> Technically there is a rollback option during upgrade. I don't know how
>> well it has been tested, but the idea is that old metadata is not deleted
>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>> having to roll back). Its your applications that work on top of HDFS and
>> YARN that I'd be concerned about.
>>
>> HTH
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Thanks for information Ravi. Is there a way that I can back up data
>>> before the update ? I was thinking about this approach..
>>>
>>> Copy the current hadoop directories to a new set of directories.
>>> Point hadoop to this new set
>>> Start the migration with the backup set
>>>
>>> Please let me know if people have done this upgrade successfully. I
>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>> in the cluster is very important.
>>> Thanks,
>>> Chathuri
>>>
>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi Chathuri!
>>>>
>>>> - When we upgrade, does it change the namenode data structures and
>>>> data nodes? I assume it only changes the name node...
>>>>
>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>> upgrade will take a long time on Datanodes as well because of
>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>
>>>> - What are the risks with this upgrade ?
>>>>
>>>> What Hadoop applications do you run on top of your cluster? The hope is
>>>> that everything continues working smoothly for the most part, but
>>>> inevitably some backward incompatible changes creep in.
>>>>
>>>> - Is there a place where I can review the changes made to file
>>>> system from 2.5.1 to 2.7.2?
>>>>
>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>> to accumulate all the changes in the versions.
>>>>
>>>> Practically, I'd try to run my application on your upgraded test
>>>> cluster.
>>>>
>>>> HTH
>>>>
>>>> Ravi
>>>>
>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>> kamalasini@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>
>>>>> I followed the following link (
>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>> went without any issues. But this system did not have that much data as the
>>>>> production system.
>>>>>
>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>> would like to see what other people have done in these cases and their
>>>>> experiences... Here are few questions I have..
>>>>>
>>>>> - When we upgrade, does it change the namenode data structures and
>>>>> data nodes? I assume it only changes the name node...
>>>>> - What are the risks with this upgrade ?
>>>>> - Is there a place where I can review the changes made to file
>>>>> system from 2.5.1 to 2.7.2?
>>>>>
>>>>> I would really appreciate if you can share your experiences.
>>>>>
>>>>> Thanks in advance,
>>>>> Chathuri
>>>>>
>>>>
>>>>
>>>
>>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Chathuri Wimalasena <ka...@gmail.com>.
Hi Ravi,
Thank you for all the information, Our application is indexing twitter data
to HBase and then do some data analytics on top of that. That's why HDFS
data is very important to us. We cannot tolerate any data loss with the
update. Do you remember how long it took for you to upgrade it from 2.4.1
to 2.7.1 ?
Thanks,
Chathuri
On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com> wrote:
> Hi Chathuri!
>
> Technically there is a rollback option during upgrade. I don't know how
> well it has been tested, but the idea is that old metadata is not deleted
> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
> having to roll back). Its your applications that work on top of HDFS and
> YARN that I'd be concerned about.
>
> HTH
> Ravi
>
> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <kamalasini@gmail.com
> > wrote:
>
>> Thanks for information Ravi. Is there a way that I can back up data
>> before the update ? I was thinking about this approach..
>>
>> Copy the current hadoop directories to a new set of directories.
>> Point hadoop to this new set
>> Start the migration with the backup set
>>
>> Please let me know if people have done this upgrade successfully. I
>> believe many things can go wrong in a lengthy upgrade like this. The data
>> in the cluster is very important.
>> Thanks,
>> Chathuri
>>
>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>> wrote:
>>
>>> Hi Chathuri!
>>>
>>> - When we upgrade, does it change the namenode data structures and
>>> data nodes? I assume it only changes the name node...
>>>
>>> It changes the NN as well as DN layout. As a matter of fact, this
>>> upgrade will take a long time on Datanodes as well because of
>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>
>>> - What are the risks with this upgrade ?
>>>
>>> What Hadoop applications do you run on top of your cluster? The hope is
>>> that everything continues working smoothly for the most part, but
>>> inevitably some backward incompatible changes creep in.
>>>
>>> - Is there a place where I can review the changes made to file
>>> system from 2.5.1 to 2.7.2?
>>>
>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>> to accumulate all the changes in the versions.
>>>
>>> Practically, I'd try to run my application on your upgraded test cluster.
>>>
>>> HTH
>>>
>>> Ravi
>>>
>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>> kamalasini@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>
>>>> I followed the following link (
>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>> and updated a single node system running in pseudo distributed mode and it
>>>> went without any issues. But this system did not have that much data as the
>>>> production system.
>>>>
>>>> Since this is a production system, I'm reluctant to do this update. I
>>>> would like to see what other people have done in these cases and their
>>>> experiences... Here are few questions I have..
>>>>
>>>> - When we upgrade, does it change the namenode data structures and
>>>> data nodes? I assume it only changes the name node...
>>>> - What are the risks with this upgrade ?
>>>> - Is there a place where I can review the changes made to file
>>>> system from 2.5.1 to 2.7.2?
>>>>
>>>> I would really appreciate if you can share your experiences.
>>>>
>>>> Thanks in advance,
>>>> Chathuri
>>>>
>>>
>>>
>>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Chathuri Wimalasena <ka...@gmail.com>.
Hi Ravi,
Thank you for all the information, Our application is indexing twitter data
to HBase and then do some data analytics on top of that. That's why HDFS
data is very important to us. We cannot tolerate any data loss with the
update. Do you remember how long it took for you to upgrade it from 2.4.1
to 2.7.1 ?
Thanks,
Chathuri
On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com> wrote:
> Hi Chathuri!
>
> Technically there is a rollback option during upgrade. I don't know how
> well it has been tested, but the idea is that old metadata is not deleted
> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
> having to roll back). Its your applications that work on top of HDFS and
> YARN that I'd be concerned about.
>
> HTH
> Ravi
>
> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <kamalasini@gmail.com
> > wrote:
>
>> Thanks for information Ravi. Is there a way that I can back up data
>> before the update ? I was thinking about this approach..
>>
>> Copy the current hadoop directories to a new set of directories.
>> Point hadoop to this new set
>> Start the migration with the backup set
>>
>> Please let me know if people have done this upgrade successfully. I
>> believe many things can go wrong in a lengthy upgrade like this. The data
>> in the cluster is very important.
>> Thanks,
>> Chathuri
>>
>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>> wrote:
>>
>>> Hi Chathuri!
>>>
>>> - When we upgrade, does it change the namenode data structures and
>>> data nodes? I assume it only changes the name node...
>>>
>>> It changes the NN as well as DN layout. As a matter of fact, this
>>> upgrade will take a long time on Datanodes as well because of
>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>
>>> - What are the risks with this upgrade ?
>>>
>>> What Hadoop applications do you run on top of your cluster? The hope is
>>> that everything continues working smoothly for the most part, but
>>> inevitably some backward incompatible changes creep in.
>>>
>>> - Is there a place where I can review the changes made to file
>>> system from 2.5.1 to 2.7.2?
>>>
>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>> to accumulate all the changes in the versions.
>>>
>>> Practically, I'd try to run my application on your upgraded test cluster.
>>>
>>> HTH
>>>
>>> Ravi
>>>
>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>> kamalasini@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>
>>>> I followed the following link (
>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>> and updated a single node system running in pseudo distributed mode and it
>>>> went without any issues. But this system did not have that much data as the
>>>> production system.
>>>>
>>>> Since this is a production system, I'm reluctant to do this update. I
>>>> would like to see what other people have done in these cases and their
>>>> experiences... Here are few questions I have..
>>>>
>>>> - When we upgrade, does it change the namenode data structures and
>>>> data nodes? I assume it only changes the name node...
>>>> - What are the risks with this upgrade ?
>>>> - Is there a place where I can review the changes made to file
>>>> system from 2.5.1 to 2.7.2?
>>>>
>>>> I would really appreciate if you can share your experiences.
>>>>
>>>> Thanks in advance,
>>>> Chathuri
>>>>
>>>
>>>
>>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Chathuri Wimalasena <ka...@gmail.com>.
Hi Ravi,
Thank you for all the information, Our application is indexing twitter data
to HBase and then do some data analytics on top of that. That's why HDFS
data is very important to us. We cannot tolerate any data loss with the
update. Do you remember how long it took for you to upgrade it from 2.4.1
to 2.7.1 ?
Thanks,
Chathuri
On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com> wrote:
> Hi Chathuri!
>
> Technically there is a rollback option during upgrade. I don't know how
> well it has been tested, but the idea is that old metadata is not deleted
> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
> having to roll back). Its your applications that work on top of HDFS and
> YARN that I'd be concerned about.
>
> HTH
> Ravi
>
> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <kamalasini@gmail.com
> > wrote:
>
>> Thanks for information Ravi. Is there a way that I can back up data
>> before the update ? I was thinking about this approach..
>>
>> Copy the current hadoop directories to a new set of directories.
>> Point hadoop to this new set
>> Start the migration with the backup set
>>
>> Please let me know if people have done this upgrade successfully. I
>> believe many things can go wrong in a lengthy upgrade like this. The data
>> in the cluster is very important.
>> Thanks,
>> Chathuri
>>
>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>> wrote:
>>
>>> Hi Chathuri!
>>>
>>> - When we upgrade, does it change the namenode data structures and
>>> data nodes? I assume it only changes the name node...
>>>
>>> It changes the NN as well as DN layout. As a matter of fact, this
>>> upgrade will take a long time on Datanodes as well because of
>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>
>>> - What are the risks with this upgrade ?
>>>
>>> What Hadoop applications do you run on top of your cluster? The hope is
>>> that everything continues working smoothly for the most part, but
>>> inevitably some backward incompatible changes creep in.
>>>
>>> - Is there a place where I can review the changes made to file
>>> system from 2.5.1 to 2.7.2?
>>>
>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>> to accumulate all the changes in the versions.
>>>
>>> Practically, I'd try to run my application on your upgraded test cluster.
>>>
>>> HTH
>>>
>>> Ravi
>>>
>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>> kamalasini@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>
>>>> I followed the following link (
>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>> and updated a single node system running in pseudo distributed mode and it
>>>> went without any issues. But this system did not have that much data as the
>>>> production system.
>>>>
>>>> Since this is a production system, I'm reluctant to do this update. I
>>>> would like to see what other people have done in these cases and their
>>>> experiences... Here are few questions I have..
>>>>
>>>> - When we upgrade, does it change the namenode data structures and
>>>> data nodes? I assume it only changes the name node...
>>>> - What are the risks with this upgrade ?
>>>> - Is there a place where I can review the changes made to file
>>>> system from 2.5.1 to 2.7.2?
>>>>
>>>> I would really appreciate if you can share your experiences.
>>>>
>>>> Thanks in advance,
>>>> Chathuri
>>>>
>>>
>>>
>>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Chathuri Wimalasena <ka...@gmail.com>.
Hi Ravi,
Thank you for all the information, Our application is indexing twitter data
to HBase and then do some data analytics on top of that. That's why HDFS
data is very important to us. We cannot tolerate any data loss with the
update. Do you remember how long it took for you to upgrade it from 2.4.1
to 2.7.1 ?
Thanks,
Chathuri
On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ra...@gmail.com> wrote:
> Hi Chathuri!
>
> Technically there is a rollback option during upgrade. I don't know how
> well it has been tested, but the idea is that old metadata is not deleted
> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
> having to roll back). Its your applications that work on top of HDFS and
> YARN that I'd be concerned about.
>
> HTH
> Ravi
>
> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <kamalasini@gmail.com
> > wrote:
>
>> Thanks for information Ravi. Is there a way that I can back up data
>> before the update ? I was thinking about this approach..
>>
>> Copy the current hadoop directories to a new set of directories.
>> Point hadoop to this new set
>> Start the migration with the backup set
>>
>> Please let me know if people have done this upgrade successfully. I
>> believe many things can go wrong in a lengthy upgrade like this. The data
>> in the cluster is very important.
>> Thanks,
>> Chathuri
>>
>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
>> wrote:
>>
>>> Hi Chathuri!
>>>
>>> - When we upgrade, does it change the namenode data structures and
>>> data nodes? I assume it only changes the name node...
>>>
>>> It changes the NN as well as DN layout. As a matter of fact, this
>>> upgrade will take a long time on Datanodes as well because of
>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>
>>> - What are the risks with this upgrade ?
>>>
>>> What Hadoop applications do you run on top of your cluster? The hope is
>>> that everything continues working smoothly for the most part, but
>>> inevitably some backward incompatible changes creep in.
>>>
>>> - Is there a place where I can review the changes made to file
>>> system from 2.5.1 to 2.7.2?
>>>
>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>> to accumulate all the changes in the versions.
>>>
>>> Practically, I'd try to run my application on your upgraded test cluster.
>>>
>>> HTH
>>>
>>> Ravi
>>>
>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>> kamalasini@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>
>>>> I followed the following link (
>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>> and updated a single node system running in pseudo distributed mode and it
>>>> went without any issues. But this system did not have that much data as the
>>>> production system.
>>>>
>>>> Since this is a production system, I'm reluctant to do this update. I
>>>> would like to see what other people have done in these cases and their
>>>> experiences... Here are few questions I have..
>>>>
>>>> - When we upgrade, does it change the namenode data structures and
>>>> data nodes? I assume it only changes the name node...
>>>> - What are the risks with this upgrade ?
>>>> - Is there a place where I can review the changes made to file
>>>> system from 2.5.1 to 2.7.2?
>>>>
>>>> I would really appreciate if you can share your experiences.
>>>>
>>>> Thanks in advance,
>>>> Chathuri
>>>>
>>>
>>>
>>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!
Technically there is a rollback option during upgrade. I don't know how
well it has been tested, but the idea is that old metadata is not deleted
until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
fairly confident that the HDFS upgrade will work smoothly. We have upgraded
quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
having to roll back). Its your applications that work on top of HDFS and
YARN that I'd be concerned about.
HTH
Ravi
On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <ka...@gmail.com>
wrote:
> Thanks for information Ravi. Is there a way that I can back up data before
> the update ? I was thinking about this approach..
>
> Copy the current hadoop directories to a new set of directories.
> Point hadoop to this new set
> Start the migration with the backup set
>
> Please let me know if people have done this upgrade successfully. I
> believe many things can go wrong in a lengthy upgrade like this. The data
> in the cluster is very important.
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> - When we upgrade, does it change the namenode data structures and
>> data nodes? I assume it only changes the name node...
>>
>> It changes the NN as well as DN layout. As a matter of fact, this upgrade
>> will take a long time on Datanodes as well because of
>> https://issues.apache.org/jira/browse/HDFS-6482
>>
>> - What are the risks with this upgrade ?
>>
>> What Hadoop applications do you run on top of your cluster? The hope is
>> that everything continues working smoothly for the most part, but
>> inevitably some backward incompatible changes creep in.
>>
>> - Is there a place where I can review the changes made to file system
>> from 2.5.1 to 2.7.2?
>>
>> The release notes. http://hadoop.apache.org/releases.html .You'd have to
>> accumulate all the changes in the versions.
>>
>> Practically, I'd try to run my application on your upgraded test cluster.
>>
>> HTH
>>
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> We have a hadoop production deployment with 1 name node and 10 data
>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>
>>> I followed the following link (
>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>> and updated a single node system running in pseudo distributed mode and it
>>> went without any issues. But this system did not have that much data as the
>>> production system.
>>>
>>> Since this is a production system, I'm reluctant to do this update. I
>>> would like to see what other people have done in these cases and their
>>> experiences... Here are few questions I have..
>>>
>>> - When we upgrade, does it change the namenode data structures and
>>> data nodes? I assume it only changes the name node...
>>> - What are the risks with this upgrade ?
>>> - Is there a place where I can review the changes made to file
>>> system from 2.5.1 to 2.7.2?
>>>
>>> I would really appreciate if you can share your experiences.
>>>
>>> Thanks in advance,
>>> Chathuri
>>>
>>
>>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!
Technically there is a rollback option during upgrade. I don't know how
well it has been tested, but the idea is that old metadata is not deleted
until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
fairly confident that the HDFS upgrade will work smoothly. We have upgraded
quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
having to roll back). Its your applications that work on top of HDFS and
YARN that I'd be concerned about.
HTH
Ravi
On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <ka...@gmail.com>
wrote:
> Thanks for information Ravi. Is there a way that I can back up data before
> the update ? I was thinking about this approach..
>
> Copy the current hadoop directories to a new set of directories.
> Point hadoop to this new set
> Start the migration with the backup set
>
> Please let me know if people have done this upgrade successfully. I
> believe many things can go wrong in a lengthy upgrade like this. The data
> in the cluster is very important.
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> - When we upgrade, does it change the namenode data structures and
>> data nodes? I assume it only changes the name node...
>>
>> It changes the NN as well as DN layout. As a matter of fact, this upgrade
>> will take a long time on Datanodes as well because of
>> https://issues.apache.org/jira/browse/HDFS-6482
>>
>> - What are the risks with this upgrade ?
>>
>> What Hadoop applications do you run on top of your cluster? The hope is
>> that everything continues working smoothly for the most part, but
>> inevitably some backward incompatible changes creep in.
>>
>> - Is there a place where I can review the changes made to file system
>> from 2.5.1 to 2.7.2?
>>
>> The release notes. http://hadoop.apache.org/releases.html .You'd have to
>> accumulate all the changes in the versions.
>>
>> Practically, I'd try to run my application on your upgraded test cluster.
>>
>> HTH
>>
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> We have a hadoop production deployment with 1 name node and 10 data
>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>
>>> I followed the following link (
>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>> and updated a single node system running in pseudo distributed mode and it
>>> went without any issues. But this system did not have that much data as the
>>> production system.
>>>
>>> Since this is a production system, I'm reluctant to do this update. I
>>> would like to see what other people have done in these cases and their
>>> experiences... Here are few questions I have..
>>>
>>> - When we upgrade, does it change the namenode data structures and
>>> data nodes? I assume it only changes the name node...
>>> - What are the risks with this upgrade ?
>>> - Is there a place where I can review the changes made to file
>>> system from 2.5.1 to 2.7.2?
>>>
>>> I would really appreciate if you can share your experiences.
>>>
>>> Thanks in advance,
>>> Chathuri
>>>
>>
>>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!
Technically there is a rollback option during upgrade. I don't know how
well it has been tested, but the idea is that old metadata is not deleted
until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
fairly confident that the HDFS upgrade will work smoothly. We have upgraded
quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
having to roll back). Its your applications that work on top of HDFS and
YARN that I'd be concerned about.
HTH
Ravi
On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <ka...@gmail.com>
wrote:
> Thanks for information Ravi. Is there a way that I can back up data before
> the update ? I was thinking about this approach..
>
> Copy the current hadoop directories to a new set of directories.
> Point hadoop to this new set
> Start the migration with the backup set
>
> Please let me know if people have done this upgrade successfully. I
> believe many things can go wrong in a lengthy upgrade like this. The data
> in the cluster is very important.
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> - When we upgrade, does it change the namenode data structures and
>> data nodes? I assume it only changes the name node...
>>
>> It changes the NN as well as DN layout. As a matter of fact, this upgrade
>> will take a long time on Datanodes as well because of
>> https://issues.apache.org/jira/browse/HDFS-6482
>>
>> - What are the risks with this upgrade ?
>>
>> What Hadoop applications do you run on top of your cluster? The hope is
>> that everything continues working smoothly for the most part, but
>> inevitably some backward incompatible changes creep in.
>>
>> - Is there a place where I can review the changes made to file system
>> from 2.5.1 to 2.7.2?
>>
>> The release notes. http://hadoop.apache.org/releases.html .You'd have to
>> accumulate all the changes in the versions.
>>
>> Practically, I'd try to run my application on your upgraded test cluster.
>>
>> HTH
>>
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> We have a hadoop production deployment with 1 name node and 10 data
>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>
>>> I followed the following link (
>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>> and updated a single node system running in pseudo distributed mode and it
>>> went without any issues. But this system did not have that much data as the
>>> production system.
>>>
>>> Since this is a production system, I'm reluctant to do this update. I
>>> would like to see what other people have done in these cases and their
>>> experiences... Here are few questions I have..
>>>
>>> - When we upgrade, does it change the namenode data structures and
>>> data nodes? I assume it only changes the name node...
>>> - What are the risks with this upgrade ?
>>> - Is there a place where I can review the changes made to file
>>> system from 2.5.1 to 2.7.2?
>>>
>>> I would really appreciate if you can share your experiences.
>>>
>>> Thanks in advance,
>>> Chathuri
>>>
>>
>>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!
Technically there is a rollback option during upgrade. I don't know how
well it has been tested, but the idea is that old metadata is not deleted
until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
fairly confident that the HDFS upgrade will work smoothly. We have upgraded
quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
having to roll back). Its your applications that work on top of HDFS and
YARN that I'd be concerned about.
HTH
Ravi
On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <ka...@gmail.com>
wrote:
> Thanks for information Ravi. Is there a way that I can back up data before
> the update ? I was thinking about this approach..
>
> Copy the current hadoop directories to a new set of directories.
> Point hadoop to this new set
> Start the migration with the backup set
>
> Please let me know if people have done this upgrade successfully. I
> believe many things can go wrong in a lengthy upgrade like this. The data
> in the cluster is very important.
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> - When we upgrade, does it change the namenode data structures and
>> data nodes? I assume it only changes the name node...
>>
>> It changes the NN as well as DN layout. As a matter of fact, this upgrade
>> will take a long time on Datanodes as well because of
>> https://issues.apache.org/jira/browse/HDFS-6482
>>
>> - What are the risks with this upgrade ?
>>
>> What Hadoop applications do you run on top of your cluster? The hope is
>> that everything continues working smoothly for the most part, but
>> inevitably some backward incompatible changes creep in.
>>
>> - Is there a place where I can review the changes made to file system
>> from 2.5.1 to 2.7.2?
>>
>> The release notes. http://hadoop.apache.org/releases.html .You'd have to
>> accumulate all the changes in the versions.
>>
>> Practically, I'd try to run my application on your upgraded test cluster.
>>
>> HTH
>>
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>> kamalasini@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> We have a hadoop production deployment with 1 name node and 10 data
>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>
>>> I followed the following link (
>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>> and updated a single node system running in pseudo distributed mode and it
>>> went without any issues. But this system did not have that much data as the
>>> production system.
>>>
>>> Since this is a production system, I'm reluctant to do this update. I
>>> would like to see what other people have done in these cases and their
>>> experiences... Here are few questions I have..
>>>
>>> - When we upgrade, does it change the namenode data structures and
>>> data nodes? I assume it only changes the name node...
>>> - What are the risks with this upgrade ?
>>> - Is there a place where I can review the changes made to file
>>> system from 2.5.1 to 2.7.2?
>>>
>>> I would really appreciate if you can share your experiences.
>>>
>>> Thanks in advance,
>>> Chathuri
>>>
>>
>>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Chathuri Wimalasena <ka...@gmail.com>.
Thanks for information Ravi. Is there a way that I can back up data before
the update ? I was thinking about this approach..
Copy the current hadoop directories to a new set of directories.
Point hadoop to this new set
Start the migration with the backup set
Please let me know if people have done this upgrade successfully. I believe
many things can go wrong in a lengthy upgrade like this. The data in the
cluster is very important.
Thanks,
Chathuri
On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com> wrote:
> Hi Chathuri!
>
> - When we upgrade, does it change the namenode data structures and
> data nodes? I assume it only changes the name node...
>
> It changes the NN as well as DN layout. As a matter of fact, this upgrade
> will take a long time on Datanodes as well because of
> https://issues.apache.org/jira/browse/HDFS-6482
>
> - What are the risks with this upgrade ?
>
> What Hadoop applications do you run on top of your cluster? The hope is
> that everything continues working smoothly for the most part, but
> inevitably some backward incompatible changes creep in.
>
> - Is there a place where I can review the changes made to file system
> from 2.5.1 to 2.7.2?
>
> The release notes. http://hadoop.apache.org/releases.html .You'd have to
> accumulate all the changes in the versions.
>
> Practically, I'd try to run my application on your upgraded test cluster.
>
> HTH
>
> Ravi
>
> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
> kamalasini@gmail.com> wrote:
>
>> Hi,
>>
>> We have a hadoop production deployment with 1 name node and 10 data nodes
>> which has more than 20TB of data in HDFS. We are currently using Hadoop
>> 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>
>> I followed the following link (
>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>> and updated a single node system running in pseudo distributed mode and it
>> went without any issues. But this system did not have that much data as the
>> production system.
>>
>> Since this is a production system, I'm reluctant to do this update. I
>> would like to see what other people have done in these cases and their
>> experiences... Here are few questions I have..
>>
>> - When we upgrade, does it change the namenode data structures and
>> data nodes? I assume it only changes the name node...
>> - What are the risks with this upgrade ?
>> - Is there a place where I can review the changes made to file system
>> from 2.5.1 to 2.7.2?
>>
>> I would really appreciate if you can share your experiences.
>>
>> Thanks in advance,
>> Chathuri
>>
>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Chathuri Wimalasena <ka...@gmail.com>.
Thanks for information Ravi. Is there a way that I can back up data before
the update ? I was thinking about this approach..
Copy the current hadoop directories to a new set of directories.
Point hadoop to this new set
Start the migration with the backup set
Please let me know if people have done this upgrade successfully. I believe
many things can go wrong in a lengthy upgrade like this. The data in the
cluster is very important.
Thanks,
Chathuri
On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com> wrote:
> Hi Chathuri!
>
> - When we upgrade, does it change the namenode data structures and
> data nodes? I assume it only changes the name node...
>
> It changes the NN as well as DN layout. As a matter of fact, this upgrade
> will take a long time on Datanodes as well because of
> https://issues.apache.org/jira/browse/HDFS-6482
>
> - What are the risks with this upgrade ?
>
> What Hadoop applications do you run on top of your cluster? The hope is
> that everything continues working smoothly for the most part, but
> inevitably some backward incompatible changes creep in.
>
> - Is there a place where I can review the changes made to file system
> from 2.5.1 to 2.7.2?
>
> The release notes. http://hadoop.apache.org/releases.html .You'd have to
> accumulate all the changes in the versions.
>
> Practically, I'd try to run my application on your upgraded test cluster.
>
> HTH
>
> Ravi
>
> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
> kamalasini@gmail.com> wrote:
>
>> Hi,
>>
>> We have a hadoop production deployment with 1 name node and 10 data nodes
>> which has more than 20TB of data in HDFS. We are currently using Hadoop
>> 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>
>> I followed the following link (
>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>> and updated a single node system running in pseudo distributed mode and it
>> went without any issues. But this system did not have that much data as the
>> production system.
>>
>> Since this is a production system, I'm reluctant to do this update. I
>> would like to see what other people have done in these cases and their
>> experiences... Here are few questions I have..
>>
>> - When we upgrade, does it change the namenode data structures and
>> data nodes? I assume it only changes the name node...
>> - What are the risks with this upgrade ?
>> - Is there a place where I can review the changes made to file system
>> from 2.5.1 to 2.7.2?
>>
>> I would really appreciate if you can share your experiences.
>>
>> Thanks in advance,
>> Chathuri
>>
>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Chathuri Wimalasena <ka...@gmail.com>.
Thanks for information Ravi. Is there a way that I can back up data before
the update ? I was thinking about this approach..
Copy the current hadoop directories to a new set of directories.
Point hadoop to this new set
Start the migration with the backup set
Please let me know if people have done this upgrade successfully. I believe
many things can go wrong in a lengthy upgrade like this. The data in the
cluster is very important.
Thanks,
Chathuri
On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com> wrote:
> Hi Chathuri!
>
> - When we upgrade, does it change the namenode data structures and
> data nodes? I assume it only changes the name node...
>
> It changes the NN as well as DN layout. As a matter of fact, this upgrade
> will take a long time on Datanodes as well because of
> https://issues.apache.org/jira/browse/HDFS-6482
>
> - What are the risks with this upgrade ?
>
> What Hadoop applications do you run on top of your cluster? The hope is
> that everything continues working smoothly for the most part, but
> inevitably some backward incompatible changes creep in.
>
> - Is there a place where I can review the changes made to file system
> from 2.5.1 to 2.7.2?
>
> The release notes. http://hadoop.apache.org/releases.html .You'd have to
> accumulate all the changes in the versions.
>
> Practically, I'd try to run my application on your upgraded test cluster.
>
> HTH
>
> Ravi
>
> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
> kamalasini@gmail.com> wrote:
>
>> Hi,
>>
>> We have a hadoop production deployment with 1 name node and 10 data nodes
>> which has more than 20TB of data in HDFS. We are currently using Hadoop
>> 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>
>> I followed the following link (
>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>> and updated a single node system running in pseudo distributed mode and it
>> went without any issues. But this system did not have that much data as the
>> production system.
>>
>> Since this is a production system, I'm reluctant to do this update. I
>> would like to see what other people have done in these cases and their
>> experiences... Here are few questions I have..
>>
>> - When we upgrade, does it change the namenode data structures and
>> data nodes? I assume it only changes the name node...
>> - What are the risks with this upgrade ?
>> - Is there a place where I can review the changes made to file system
>> from 2.5.1 to 2.7.2?
>>
>> I would really appreciate if you can share your experiences.
>>
>> Thanks in advance,
>> Chathuri
>>
>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Chathuri Wimalasena <ka...@gmail.com>.
Thanks for information Ravi. Is there a way that I can back up data before
the update ? I was thinking about this approach..
Copy the current hadoop directories to a new set of directories.
Point hadoop to this new set
Start the migration with the backup set
Please let me know if people have done this upgrade successfully. I believe
many things can go wrong in a lengthy upgrade like this. The data in the
cluster is very important.
Thanks,
Chathuri
On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ra...@gmail.com> wrote:
> Hi Chathuri!
>
> - When we upgrade, does it change the namenode data structures and
> data nodes? I assume it only changes the name node...
>
> It changes the NN as well as DN layout. As a matter of fact, this upgrade
> will take a long time on Datanodes as well because of
> https://issues.apache.org/jira/browse/HDFS-6482
>
> - What are the risks with this upgrade ?
>
> What Hadoop applications do you run on top of your cluster? The hope is
> that everything continues working smoothly for the most part, but
> inevitably some backward incompatible changes creep in.
>
> - Is there a place where I can review the changes made to file system
> from 2.5.1 to 2.7.2?
>
> The release notes. http://hadoop.apache.org/releases.html .You'd have to
> accumulate all the changes in the versions.
>
> Practically, I'd try to run my application on your upgraded test cluster.
>
> HTH
>
> Ravi
>
> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
> kamalasini@gmail.com> wrote:
>
>> Hi,
>>
>> We have a hadoop production deployment with 1 name node and 10 data nodes
>> which has more than 20TB of data in HDFS. We are currently using Hadoop
>> 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>
>> I followed the following link (
>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>> and updated a single node system running in pseudo distributed mode and it
>> went without any issues. But this system did not have that much data as the
>> production system.
>>
>> Since this is a production system, I'm reluctant to do this update. I
>> would like to see what other people have done in these cases and their
>> experiences... Here are few questions I have..
>>
>> - When we upgrade, does it change the namenode data structures and
>> data nodes? I assume it only changes the name node...
>> - What are the risks with this upgrade ?
>> - Is there a place where I can review the changes made to file system
>> from 2.5.1 to 2.7.2?
>>
>> I would really appreciate if you can share your experiences.
>>
>> Thanks in advance,
>> Chathuri
>>
>
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!
- When we upgrade, does it change the namenode data structures and data
nodes? I assume it only changes the name node...
It changes the NN as well as DN layout. As a matter of fact, this upgrade
will take a long time on Datanodes as well because of
https://issues.apache.org/jira/browse/HDFS-6482
- What are the risks with this upgrade ?
What Hadoop applications do you run on top of your cluster? The hope is
that everything continues working smoothly for the most part, but
inevitably some backward incompatible changes creep in.
- Is there a place where I can review the changes made to file system
from 2.5.1 to 2.7.2?
The release notes. http://hadoop.apache.org/releases.html .You'd have to
accumulate all the changes in the versions.
Practically, I'd try to run my application on your upgraded test cluster.
HTH
Ravi
On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <ka...@gmail.com>
wrote:
> Hi,
>
> We have a hadoop production deployment with 1 name node and 10 data nodes
> which has more than 20TB of data in HDFS. We are currently using Hadoop
> 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>
> I followed the following link (
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
> and updated a single node system running in pseudo distributed mode and it
> went without any issues. But this system did not have that much data as the
> production system.
>
> Since this is a production system, I'm reluctant to do this update. I
> would like to see what other people have done in these cases and their
> experiences... Here are few questions I have..
>
> - When we upgrade, does it change the namenode data structures and
> data nodes? I assume it only changes the name node...
> - What are the risks with this upgrade ?
> - Is there a place where I can review the changes made to file system
> from 2.5.1 to 2.7.2?
>
> I would really appreciate if you can share your experiences.
>
> Thanks in advance,
> Chathuri
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!
- When we upgrade, does it change the namenode data structures and data
nodes? I assume it only changes the name node...
It changes the NN as well as DN layout. As a matter of fact, this upgrade
will take a long time on Datanodes as well because of
https://issues.apache.org/jira/browse/HDFS-6482
- What are the risks with this upgrade ?
What Hadoop applications do you run on top of your cluster? The hope is
that everything continues working smoothly for the most part, but
inevitably some backward incompatible changes creep in.
- Is there a place where I can review the changes made to file system
from 2.5.1 to 2.7.2?
The release notes. http://hadoop.apache.org/releases.html .You'd have to
accumulate all the changes in the versions.
Practically, I'd try to run my application on your upgraded test cluster.
HTH
Ravi
On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <ka...@gmail.com>
wrote:
> Hi,
>
> We have a hadoop production deployment with 1 name node and 10 data nodes
> which has more than 20TB of data in HDFS. We are currently using Hadoop
> 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>
> I followed the following link (
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
> and updated a single node system running in pseudo distributed mode and it
> went without any issues. But this system did not have that much data as the
> production system.
>
> Since this is a production system, I'm reluctant to do this update. I
> would like to see what other people have done in these cases and their
> experiences... Here are few questions I have..
>
> - When we upgrade, does it change the namenode data structures and
> data nodes? I assume it only changes the name node...
> - What are the risks with this upgrade ?
> - Is there a place where I can review the changes made to file system
> from 2.5.1 to 2.7.2?
>
> I would really appreciate if you can share your experiences.
>
> Thanks in advance,
> Chathuri
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!
- When we upgrade, does it change the namenode data structures and data
nodes? I assume it only changes the name node...
It changes the NN as well as DN layout. As a matter of fact, this upgrade
will take a long time on Datanodes as well because of
https://issues.apache.org/jira/browse/HDFS-6482
- What are the risks with this upgrade ?
What Hadoop applications do you run on top of your cluster? The hope is
that everything continues working smoothly for the most part, but
inevitably some backward incompatible changes creep in.
- Is there a place where I can review the changes made to file system
from 2.5.1 to 2.7.2?
The release notes. http://hadoop.apache.org/releases.html .You'd have to
accumulate all the changes in the versions.
Practically, I'd try to run my application on your upgraded test cluster.
HTH
Ravi
On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <ka...@gmail.com>
wrote:
> Hi,
>
> We have a hadoop production deployment with 1 name node and 10 data nodes
> which has more than 20TB of data in HDFS. We are currently using Hadoop
> 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>
> I followed the following link (
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
> and updated a single node system running in pseudo distributed mode and it
> went without any issues. But this system did not have that much data as the
> production system.
>
> Since this is a production system, I'm reluctant to do this update. I
> would like to see what other people have done in these cases and their
> experiences... Here are few questions I have..
>
> - When we upgrade, does it change the namenode data structures and
> data nodes? I assume it only changes the name node...
> - What are the risks with this upgrade ?
> - Is there a place where I can review the changes made to file system
> from 2.5.1 to 2.7.2?
>
> I would really appreciate if you can share your experiences.
>
> Thanks in advance,
> Chathuri
>
Re: Upgrading production hadoop system from 2.5.1 to 2.7.2
Posted by Ravi Prakash <ra...@gmail.com>.
Hi Chathuri!
- When we upgrade, does it change the namenode data structures and data
nodes? I assume it only changes the name node...
It changes the NN as well as DN layout. As a matter of fact, this upgrade
will take a long time on Datanodes as well because of
https://issues.apache.org/jira/browse/HDFS-6482
- What are the risks with this upgrade ?
What Hadoop applications do you run on top of your cluster? The hope is
that everything continues working smoothly for the most part, but
inevitably some backward incompatible changes creep in.
- Is there a place where I can review the changes made to file system
from 2.5.1 to 2.7.2?
The release notes. http://hadoop.apache.org/releases.html .You'd have to
accumulate all the changes in the versions.
Practically, I'd try to run my application on your upgraded test cluster.
HTH
Ravi
On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <ka...@gmail.com>
wrote:
> Hi,
>
> We have a hadoop production deployment with 1 name node and 10 data nodes
> which has more than 20TB of data in HDFS. We are currently using Hadoop
> 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>
> I followed the following link (
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
> and updated a single node system running in pseudo distributed mode and it
> went without any issues. But this system did not have that much data as the
> production system.
>
> Since this is a production system, I'm reluctant to do this update. I
> would like to see what other people have done in these cases and their
> experiences... Here are few questions I have..
>
> - When we upgrade, does it change the namenode data structures and
> data nodes? I assume it only changes the name node...
> - What are the risks with this upgrade ?
> - Is there a place where I can review the changes made to file system
> from 2.5.1 to 2.7.2?
>
> I would really appreciate if you can share your experiences.
>
> Thanks in advance,
> Chathuri
>