You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Tom Brown <to...@gmail.com> on 2013/04/17 19:20:25 UTC

Physically moving HDFS cluster to new

We have a situation where we want to physically move our small (4 node)
cluster from one data center to another. As part of this move, each node
will receive both a new FQN and a new IP address. As I understand it, HDFS
is somehow tied to the the FQN or IP address, and changing them causes data
loss.

Is there any supported method of moving a cluster this way?

Thanks in advance!

--Tom

Re: Physically moving HDFS cluster to new

Posted by MARCOS MEDRADO RUBINELLI <ma...@buscapecompany.com>.
Here's a rough guideline:

Moving a cluster isn't all that different from upgrading it. The initial steps are the same:
- stop your mapreduce services
- switch you namenode to safe mode
- generate a final image with -saveNamespace
- stop your hfds services
- back up your metadata - as long as you have a copy of you metadata, there's a good chance you can recover a cluster without data loss

Now, before you turn off and pack up your machines, it's a good idea to update your hosts, as Bejoy describes. Assuming you do have the new IPs in advance, of course. It isn't strictly necessary, but if your services are configured to start on a bootup, it will save you the work of bringing them down, updating your hosts/XMLs, then bringing them up again.

Now, when the namenode starts, all it has is the metadata. It knows what files should be in HDFS, and what blocks belong to which files. But it has no information on where it can find those blocks. If you run a fsck, it will report back saying every file is corrupt. So don't do it, it will just generate unnecessary panic.

When a datanode starts, it scans its data directories, and makes a list of all the blocks it has. If you configured your cluster right, the datanode will then locate the namenode, and pass this block report on. After a few minutes, once all your datanodes are online, your namenode will report a full, healthy file system. You can run some sanity checks, and once you're satisfied, start the jobtracker and tasktrackers.

Good luck!
Marcos

On 18-04-2013 02:27, Bejoy Ks Wrote:
Adding on to the comments

You might need to update the etc-hosts with new values.
If the host name changes as well, you may need to update the

fs.default.name<http://fs.default.name> and mapred.job.tracker with new values.


On Thu, Apr 18, 2013 at 10:08 AM, Azuryy Yu <az...@gmail.com>> wrote:
Data nodes name or IP  changed cannot cause your data loss. only kept fsimage(under the namenode.data.dir) and all block data on the data nodes, then everything can be recoveryed when your start the cluster.


On Thu, Apr 18, 2013 at 1:20 AM, Tom Brown <to...@gmail.com>> wrote:
We have a situation where we want to physically move our small (4 node) cluster from one data center to another. As part of this move, each node will receive both a new FQN and a new IP address. As I understand it, HDFS is somehow tied to the the FQN or IP address, and changing them causes data loss.

Is there any supported method of moving a cluster this way?

Thanks in advance!

--Tom




Re: Physically moving HDFS cluster to new

Posted by MARCOS MEDRADO RUBINELLI <ma...@buscapecompany.com>.
Here's a rough guideline:

Moving a cluster isn't all that different from upgrading it. The initial steps are the same:
- stop your mapreduce services
- switch you namenode to safe mode
- generate a final image with -saveNamespace
- stop your hfds services
- back up your metadata - as long as you have a copy of you metadata, there's a good chance you can recover a cluster without data loss

Now, before you turn off and pack up your machines, it's a good idea to update your hosts, as Bejoy describes. Assuming you do have the new IPs in advance, of course. It isn't strictly necessary, but if your services are configured to start on a bootup, it will save you the work of bringing them down, updating your hosts/XMLs, then bringing them up again.

Now, when the namenode starts, all it has is the metadata. It knows what files should be in HDFS, and what blocks belong to which files. But it has no information on where it can find those blocks. If you run a fsck, it will report back saying every file is corrupt. So don't do it, it will just generate unnecessary panic.

When a datanode starts, it scans its data directories, and makes a list of all the blocks it has. If you configured your cluster right, the datanode will then locate the namenode, and pass this block report on. After a few minutes, once all your datanodes are online, your namenode will report a full, healthy file system. You can run some sanity checks, and once you're satisfied, start the jobtracker and tasktrackers.

Good luck!
Marcos

On 18-04-2013 02:27, Bejoy Ks Wrote:
Adding on to the comments

You might need to update the etc-hosts with new values.
If the host name changes as well, you may need to update the

fs.default.name<http://fs.default.name> and mapred.job.tracker with new values.


On Thu, Apr 18, 2013 at 10:08 AM, Azuryy Yu <az...@gmail.com>> wrote:
Data nodes name or IP  changed cannot cause your data loss. only kept fsimage(under the namenode.data.dir) and all block data on the data nodes, then everything can be recoveryed when your start the cluster.


On Thu, Apr 18, 2013 at 1:20 AM, Tom Brown <to...@gmail.com>> wrote:
We have a situation where we want to physically move our small (4 node) cluster from one data center to another. As part of this move, each node will receive both a new FQN and a new IP address. As I understand it, HDFS is somehow tied to the the FQN or IP address, and changing them causes data loss.

Is there any supported method of moving a cluster this way?

Thanks in advance!

--Tom




Re: Physically moving HDFS cluster to new

Posted by MARCOS MEDRADO RUBINELLI <ma...@buscapecompany.com>.
Here's a rough guideline:

Moving a cluster isn't all that different from upgrading it. The initial steps are the same:
- stop your mapreduce services
- switch you namenode to safe mode
- generate a final image with -saveNamespace
- stop your hfds services
- back up your metadata - as long as you have a copy of you metadata, there's a good chance you can recover a cluster without data loss

Now, before you turn off and pack up your machines, it's a good idea to update your hosts, as Bejoy describes. Assuming you do have the new IPs in advance, of course. It isn't strictly necessary, but if your services are configured to start on a bootup, it will save you the work of bringing them down, updating your hosts/XMLs, then bringing them up again.

Now, when the namenode starts, all it has is the metadata. It knows what files should be in HDFS, and what blocks belong to which files. But it has no information on where it can find those blocks. If you run a fsck, it will report back saying every file is corrupt. So don't do it, it will just generate unnecessary panic.

When a datanode starts, it scans its data directories, and makes a list of all the blocks it has. If you configured your cluster right, the datanode will then locate the namenode, and pass this block report on. After a few minutes, once all your datanodes are online, your namenode will report a full, healthy file system. You can run some sanity checks, and once you're satisfied, start the jobtracker and tasktrackers.

Good luck!
Marcos

On 18-04-2013 02:27, Bejoy Ks Wrote:
Adding on to the comments

You might need to update the etc-hosts with new values.
If the host name changes as well, you may need to update the

fs.default.name<http://fs.default.name> and mapred.job.tracker with new values.


On Thu, Apr 18, 2013 at 10:08 AM, Azuryy Yu <az...@gmail.com>> wrote:
Data nodes name or IP  changed cannot cause your data loss. only kept fsimage(under the namenode.data.dir) and all block data on the data nodes, then everything can be recoveryed when your start the cluster.


On Thu, Apr 18, 2013 at 1:20 AM, Tom Brown <to...@gmail.com>> wrote:
We have a situation where we want to physically move our small (4 node) cluster from one data center to another. As part of this move, each node will receive both a new FQN and a new IP address. As I understand it, HDFS is somehow tied to the the FQN or IP address, and changing them causes data loss.

Is there any supported method of moving a cluster this way?

Thanks in advance!

--Tom




Re: Physically moving HDFS cluster to new

Posted by MARCOS MEDRADO RUBINELLI <ma...@buscapecompany.com>.
Here's a rough guideline:

Moving a cluster isn't all that different from upgrading it. The initial steps are the same:
- stop your mapreduce services
- switch you namenode to safe mode
- generate a final image with -saveNamespace
- stop your hfds services
- back up your metadata - as long as you have a copy of you metadata, there's a good chance you can recover a cluster without data loss

Now, before you turn off and pack up your machines, it's a good idea to update your hosts, as Bejoy describes. Assuming you do have the new IPs in advance, of course. It isn't strictly necessary, but if your services are configured to start on a bootup, it will save you the work of bringing them down, updating your hosts/XMLs, then bringing them up again.

Now, when the namenode starts, all it has is the metadata. It knows what files should be in HDFS, and what blocks belong to which files. But it has no information on where it can find those blocks. If you run a fsck, it will report back saying every file is corrupt. So don't do it, it will just generate unnecessary panic.

When a datanode starts, it scans its data directories, and makes a list of all the blocks it has. If you configured your cluster right, the datanode will then locate the namenode, and pass this block report on. After a few minutes, once all your datanodes are online, your namenode will report a full, healthy file system. You can run some sanity checks, and once you're satisfied, start the jobtracker and tasktrackers.

Good luck!
Marcos

On 18-04-2013 02:27, Bejoy Ks Wrote:
Adding on to the comments

You might need to update the etc-hosts with new values.
If the host name changes as well, you may need to update the

fs.default.name<http://fs.default.name> and mapred.job.tracker with new values.


On Thu, Apr 18, 2013 at 10:08 AM, Azuryy Yu <az...@gmail.com>> wrote:
Data nodes name or IP  changed cannot cause your data loss. only kept fsimage(under the namenode.data.dir) and all block data on the data nodes, then everything can be recoveryed when your start the cluster.


On Thu, Apr 18, 2013 at 1:20 AM, Tom Brown <to...@gmail.com>> wrote:
We have a situation where we want to physically move our small (4 node) cluster from one data center to another. As part of this move, each node will receive both a new FQN and a new IP address. As I understand it, HDFS is somehow tied to the the FQN or IP address, and changing them causes data loss.

Is there any supported method of moving a cluster this way?

Thanks in advance!

--Tom




Re: Physically moving HDFS cluster to new

Posted by Bejoy Ks <be...@gmail.com>.
Adding on to the comments

You might need to update the etc-hosts with new values.
If the host name changes as well, you may need to update the

fs.default.name and mapred.job.tracker with new values.



On Thu, Apr 18, 2013 at 10:08 AM, Azuryy Yu <az...@gmail.com> wrote:

> Data nodes name or IP  changed cannot cause your data loss. only kept
> fsimage(under the namenode.data.dir) and all block data on the data nodes,
> then everything can be recoveryed when your start the cluster.
>
>
> On Thu, Apr 18, 2013 at 1:20 AM, Tom Brown <to...@gmail.com> wrote:
>
>> We have a situation where we want to physically move our small (4 node)
>> cluster from one data center to another. As part of this move, each node
>> will receive both a new FQN and a new IP address. As I understand it, HDFS
>> is somehow tied to the the FQN or IP address, and changing them causes data
>> loss.
>>
>> Is there any supported method of moving a cluster this way?
>>
>> Thanks in advance!
>>
>> --Tom
>>
>
>

Re: Physically moving HDFS cluster to new

Posted by Bejoy Ks <be...@gmail.com>.
Adding on to the comments

You might need to update the etc-hosts with new values.
If the host name changes as well, you may need to update the

fs.default.name and mapred.job.tracker with new values.



On Thu, Apr 18, 2013 at 10:08 AM, Azuryy Yu <az...@gmail.com> wrote:

> Data nodes name or IP  changed cannot cause your data loss. only kept
> fsimage(under the namenode.data.dir) and all block data on the data nodes,
> then everything can be recoveryed when your start the cluster.
>
>
> On Thu, Apr 18, 2013 at 1:20 AM, Tom Brown <to...@gmail.com> wrote:
>
>> We have a situation where we want to physically move our small (4 node)
>> cluster from one data center to another. As part of this move, each node
>> will receive both a new FQN and a new IP address. As I understand it, HDFS
>> is somehow tied to the the FQN or IP address, and changing them causes data
>> loss.
>>
>> Is there any supported method of moving a cluster this way?
>>
>> Thanks in advance!
>>
>> --Tom
>>
>
>

Re: Physically moving HDFS cluster to new

Posted by Bejoy Ks <be...@gmail.com>.
Adding on to the comments

You might need to update the etc-hosts with new values.
If the host name changes as well, you may need to update the

fs.default.name and mapred.job.tracker with new values.



On Thu, Apr 18, 2013 at 10:08 AM, Azuryy Yu <az...@gmail.com> wrote:

> Data nodes name or IP  changed cannot cause your data loss. only kept
> fsimage(under the namenode.data.dir) and all block data on the data nodes,
> then everything can be recoveryed when your start the cluster.
>
>
> On Thu, Apr 18, 2013 at 1:20 AM, Tom Brown <to...@gmail.com> wrote:
>
>> We have a situation where we want to physically move our small (4 node)
>> cluster from one data center to another. As part of this move, each node
>> will receive both a new FQN and a new IP address. As I understand it, HDFS
>> is somehow tied to the the FQN or IP address, and changing them causes data
>> loss.
>>
>> Is there any supported method of moving a cluster this way?
>>
>> Thanks in advance!
>>
>> --Tom
>>
>
>

Re: Physically moving HDFS cluster to new

Posted by Bejoy Ks <be...@gmail.com>.
Adding on to the comments

You might need to update the etc-hosts with new values.
If the host name changes as well, you may need to update the

fs.default.name and mapred.job.tracker with new values.



On Thu, Apr 18, 2013 at 10:08 AM, Azuryy Yu <az...@gmail.com> wrote:

> Data nodes name or IP  changed cannot cause your data loss. only kept
> fsimage(under the namenode.data.dir) and all block data on the data nodes,
> then everything can be recoveryed when your start the cluster.
>
>
> On Thu, Apr 18, 2013 at 1:20 AM, Tom Brown <to...@gmail.com> wrote:
>
>> We have a situation where we want to physically move our small (4 node)
>> cluster from one data center to another. As part of this move, each node
>> will receive both a new FQN and a new IP address. As I understand it, HDFS
>> is somehow tied to the the FQN or IP address, and changing them causes data
>> loss.
>>
>> Is there any supported method of moving a cluster this way?
>>
>> Thanks in advance!
>>
>> --Tom
>>
>
>

Re: Physically moving HDFS cluster to new

Posted by Azuryy Yu <az...@gmail.com>.
Data nodes name or IP  changed cannot cause your data loss. only kept
fsimage(under the namenode.data.dir) and all block data on the data nodes,
then everything can be recoveryed when your start the cluster.


On Thu, Apr 18, 2013 at 1:20 AM, Tom Brown <to...@gmail.com> wrote:

> We have a situation where we want to physically move our small (4 node)
> cluster from one data center to another. As part of this move, each node
> will receive both a new FQN and a new IP address. As I understand it, HDFS
> is somehow tied to the the FQN or IP address, and changing them causes data
> loss.
>
> Is there any supported method of moving a cluster this way?
>
> Thanks in advance!
>
> --Tom
>

Re: Physically moving HDFS cluster to new

Posted by Azuryy Yu <az...@gmail.com>.
Data nodes name or IP  changed cannot cause your data loss. only kept
fsimage(under the namenode.data.dir) and all block data on the data nodes,
then everything can be recoveryed when your start the cluster.


On Thu, Apr 18, 2013 at 1:20 AM, Tom Brown <to...@gmail.com> wrote:

> We have a situation where we want to physically move our small (4 node)
> cluster from one data center to another. As part of this move, each node
> will receive both a new FQN and a new IP address. As I understand it, HDFS
> is somehow tied to the the FQN or IP address, and changing them causes data
> loss.
>
> Is there any supported method of moving a cluster this way?
>
> Thanks in advance!
>
> --Tom
>

Re: Physically moving HDFS cluster to new

Posted by Rajiv Chittajallu <ra...@yahoo-inc.com>.

On 4/17/13 7:23 PM, "Ted Dunning" <td...@maprtech.com> wrote:

>
>It may or may not help you in your current distress, but MapR's
>distribution could handle this pretty easily.
>
>
>One method is direct distcp between clusters, but you could also use
>MapR's mirroring capabilities to migrate data.
>
>
>You can also carry a MapR cluster, change the IP addresses and relight
>the cluster without data loss.  You can also move disks (respecting
>RAID-0 disk groups, of course) from machine to machine within a cluster
>and have them wake up with all file
> and directory meta-data intact.
>
>
>Furthermore, you can lose any two machines in a cluster and are
>guaranteed to be able to reconstruct the cluster.  Even if you lose all
>three replicas of *any* of the data or meta-data in the cluster, you can
>*still* reconstruct any data volumes
> for which at least one copy survives.  The the lost data volumes come
>back at a later time, you will also be able resurrect the data correctly.
>
>
>None of this is true for any of the other major Hadoop distributions.


This is not correct. In HDFS, As long as the fsimage and copy of a data
block are kept intact you can do any changes to the nodes.


>
>Let me know if you want to try this out.
>
>
>On Wed, Apr 17, 2013 at 5:20 PM, Tom Brown
><to...@gmail.com> wrote:
>
>We have a situation where we want to physically move our small (4 node)
>cluster from one data center to another. As part of this move, each node
>will receive both a new FQN and a new IP address. As I understand it,
>HDFS is somehow tied to the
> the FQN or IP address, and changing them causes data loss.
>
>
>Is there any supported method of moving a cluster this way?
>
>
>
>Thanks in advance!
>
>
>--Tom
>
>
>
>
>
>


Re: Physically moving HDFS cluster to new

Posted by Rajiv Chittajallu <ra...@yahoo-inc.com>.

On 4/17/13 7:23 PM, "Ted Dunning" <td...@maprtech.com> wrote:

>
>It may or may not help you in your current distress, but MapR's
>distribution could handle this pretty easily.
>
>
>One method is direct distcp between clusters, but you could also use
>MapR's mirroring capabilities to migrate data.
>
>
>You can also carry a MapR cluster, change the IP addresses and relight
>the cluster without data loss.  You can also move disks (respecting
>RAID-0 disk groups, of course) from machine to machine within a cluster
>and have them wake up with all file
> and directory meta-data intact.
>
>
>Furthermore, you can lose any two machines in a cluster and are
>guaranteed to be able to reconstruct the cluster.  Even if you lose all
>three replicas of *any* of the data or meta-data in the cluster, you can
>*still* reconstruct any data volumes
> for which at least one copy survives.  The the lost data volumes come
>back at a later time, you will also be able resurrect the data correctly.
>
>
>None of this is true for any of the other major Hadoop distributions.


This is not correct. In HDFS, As long as the fsimage and copy of a data
block are kept intact you can do any changes to the nodes.


>
>Let me know if you want to try this out.
>
>
>On Wed, Apr 17, 2013 at 5:20 PM, Tom Brown
><to...@gmail.com> wrote:
>
>We have a situation where we want to physically move our small (4 node)
>cluster from one data center to another. As part of this move, each node
>will receive both a new FQN and a new IP address. As I understand it,
>HDFS is somehow tied to the
> the FQN or IP address, and changing them causes data loss.
>
>
>Is there any supported method of moving a cluster this way?
>
>
>
>Thanks in advance!
>
>
>--Tom
>
>
>
>
>
>


Re: Physically moving HDFS cluster to new

Posted by Rajiv Chittajallu <ra...@yahoo-inc.com>.

On 4/17/13 7:23 PM, "Ted Dunning" <td...@maprtech.com> wrote:

>
>It may or may not help you in your current distress, but MapR's
>distribution could handle this pretty easily.
>
>
>One method is direct distcp between clusters, but you could also use
>MapR's mirroring capabilities to migrate data.
>
>
>You can also carry a MapR cluster, change the IP addresses and relight
>the cluster without data loss.  You can also move disks (respecting
>RAID-0 disk groups, of course) from machine to machine within a cluster
>and have them wake up with all file
> and directory meta-data intact.
>
>
>Furthermore, you can lose any two machines in a cluster and are
>guaranteed to be able to reconstruct the cluster.  Even if you lose all
>three replicas of *any* of the data or meta-data in the cluster, you can
>*still* reconstruct any data volumes
> for which at least one copy survives.  The the lost data volumes come
>back at a later time, you will also be able resurrect the data correctly.
>
>
>None of this is true for any of the other major Hadoop distributions.


This is not correct. In HDFS, As long as the fsimage and copy of a data
block are kept intact you can do any changes to the nodes.


>
>Let me know if you want to try this out.
>
>
>On Wed, Apr 17, 2013 at 5:20 PM, Tom Brown
><to...@gmail.com> wrote:
>
>We have a situation where we want to physically move our small (4 node)
>cluster from one data center to another. As part of this move, each node
>will receive both a new FQN and a new IP address. As I understand it,
>HDFS is somehow tied to the
> the FQN or IP address, and changing them causes data loss.
>
>
>Is there any supported method of moving a cluster this way?
>
>
>
>Thanks in advance!
>
>
>--Tom
>
>
>
>
>
>


Re: Physically moving HDFS cluster to new

Posted by Rajiv Chittajallu <ra...@yahoo-inc.com>.

On 4/17/13 7:23 PM, "Ted Dunning" <td...@maprtech.com> wrote:

>
>It may or may not help you in your current distress, but MapR's
>distribution could handle this pretty easily.
>
>
>One method is direct distcp between clusters, but you could also use
>MapR's mirroring capabilities to migrate data.
>
>
>You can also carry a MapR cluster, change the IP addresses and relight
>the cluster without data loss.  You can also move disks (respecting
>RAID-0 disk groups, of course) from machine to machine within a cluster
>and have them wake up with all file
> and directory meta-data intact.
>
>
>Furthermore, you can lose any two machines in a cluster and are
>guaranteed to be able to reconstruct the cluster.  Even if you lose all
>three replicas of *any* of the data or meta-data in the cluster, you can
>*still* reconstruct any data volumes
> for which at least one copy survives.  The the lost data volumes come
>back at a later time, you will also be able resurrect the data correctly.
>
>
>None of this is true for any of the other major Hadoop distributions.


This is not correct. In HDFS, As long as the fsimage and copy of a data
block are kept intact you can do any changes to the nodes.


>
>Let me know if you want to try this out.
>
>
>On Wed, Apr 17, 2013 at 5:20 PM, Tom Brown
><to...@gmail.com> wrote:
>
>We have a situation where we want to physically move our small (4 node)
>cluster from one data center to another. As part of this move, each node
>will receive both a new FQN and a new IP address. As I understand it,
>HDFS is somehow tied to the
> the FQN or IP address, and changing them causes data loss.
>
>
>Is there any supported method of moving a cluster this way?
>
>
>
>Thanks in advance!
>
>
>--Tom
>
>
>
>
>
>


Re: Physically moving HDFS cluster to new

Posted by Ted Dunning <td...@maprtech.com>.
It may or may not help you in your current distress, but MapR's
distribution could handle this pretty easily.

One method is direct distcp between clusters, but you could also use MapR's
mirroring capabilities to migrate data.

You can also carry a MapR cluster, change the IP addresses and relight the
cluster without data loss.  You can also move disks (respecting RAID-0 disk
groups, of course) from machine to machine within a cluster and have them
wake up with all file and directory meta-data intact.

Furthermore, you can lose any two machines in a cluster and are guaranteed
to be able to reconstruct the cluster.  Even if you lose all three replicas
of *any* of the data or meta-data in the cluster, you can *still*
reconstruct any data volumes for which at least one copy survives.  The the
lost data volumes come back at a later time, you will also be able
resurrect the data correctly.

None of this is true for any of the other major Hadoop distributions.

Let me know if you want to try this out.





On Wed, Apr 17, 2013 at 5:20 PM, Tom Brown <to...@gmail.com> wrote:

> We have a situation where we want to physically move our small (4 node)
> cluster from one data center to another. As part of this move, each node
> will receive both a new FQN and a new IP address. As I understand it, HDFS
> is somehow tied to the the FQN or IP address, and changing them causes data
> loss.
>
> Is there any supported method of moving a cluster this way?
>
> Thanks in advance!
>
> --Tom
>

Re: Physically moving HDFS cluster to new

Posted by Azuryy Yu <az...@gmail.com>.
Data nodes name or IP  changed cannot cause your data loss. only kept
fsimage(under the namenode.data.dir) and all block data on the data nodes,
then everything can be recoveryed when your start the cluster.


On Thu, Apr 18, 2013 at 1:20 AM, Tom Brown <to...@gmail.com> wrote:

> We have a situation where we want to physically move our small (4 node)
> cluster from one data center to another. As part of this move, each node
> will receive both a new FQN and a new IP address. As I understand it, HDFS
> is somehow tied to the the FQN or IP address, and changing them causes data
> loss.
>
> Is there any supported method of moving a cluster this way?
>
> Thanks in advance!
>
> --Tom
>

Re: Physically moving HDFS cluster to new

Posted by Ted Dunning <td...@maprtech.com>.
It may or may not help you in your current distress, but MapR's
distribution could handle this pretty easily.

One method is direct distcp between clusters, but you could also use MapR's
mirroring capabilities to migrate data.

You can also carry a MapR cluster, change the IP addresses and relight the
cluster without data loss.  You can also move disks (respecting RAID-0 disk
groups, of course) from machine to machine within a cluster and have them
wake up with all file and directory meta-data intact.

Furthermore, you can lose any two machines in a cluster and are guaranteed
to be able to reconstruct the cluster.  Even if you lose all three replicas
of *any* of the data or meta-data in the cluster, you can *still*
reconstruct any data volumes for which at least one copy survives.  The the
lost data volumes come back at a later time, you will also be able
resurrect the data correctly.

None of this is true for any of the other major Hadoop distributions.

Let me know if you want to try this out.





On Wed, Apr 17, 2013 at 5:20 PM, Tom Brown <to...@gmail.com> wrote:

> We have a situation where we want to physically move our small (4 node)
> cluster from one data center to another. As part of this move, each node
> will receive both a new FQN and a new IP address. As I understand it, HDFS
> is somehow tied to the the FQN or IP address, and changing them causes data
> loss.
>
> Is there any supported method of moving a cluster this way?
>
> Thanks in advance!
>
> --Tom
>

Re: Physically moving HDFS cluster to new

Posted by Ted Dunning <td...@maprtech.com>.
It may or may not help you in your current distress, but MapR's
distribution could handle this pretty easily.

One method is direct distcp between clusters, but you could also use MapR's
mirroring capabilities to migrate data.

You can also carry a MapR cluster, change the IP addresses and relight the
cluster without data loss.  You can also move disks (respecting RAID-0 disk
groups, of course) from machine to machine within a cluster and have them
wake up with all file and directory meta-data intact.

Furthermore, you can lose any two machines in a cluster and are guaranteed
to be able to reconstruct the cluster.  Even if you lose all three replicas
of *any* of the data or meta-data in the cluster, you can *still*
reconstruct any data volumes for which at least one copy survives.  The the
lost data volumes come back at a later time, you will also be able
resurrect the data correctly.

None of this is true for any of the other major Hadoop distributions.

Let me know if you want to try this out.





On Wed, Apr 17, 2013 at 5:20 PM, Tom Brown <to...@gmail.com> wrote:

> We have a situation where we want to physically move our small (4 node)
> cluster from one data center to another. As part of this move, each node
> will receive both a new FQN and a new IP address. As I understand it, HDFS
> is somehow tied to the the FQN or IP address, and changing them causes data
> loss.
>
> Is there any supported method of moving a cluster this way?
>
> Thanks in advance!
>
> --Tom
>

Re: Physically moving HDFS cluster to new

Posted by Azuryy Yu <az...@gmail.com>.
Data nodes name or IP  changed cannot cause your data loss. only kept
fsimage(under the namenode.data.dir) and all block data on the data nodes,
then everything can be recoveryed when your start the cluster.


On Thu, Apr 18, 2013 at 1:20 AM, Tom Brown <to...@gmail.com> wrote:

> We have a situation where we want to physically move our small (4 node)
> cluster from one data center to another. As part of this move, each node
> will receive both a new FQN and a new IP address. As I understand it, HDFS
> is somehow tied to the the FQN or IP address, and changing them causes data
> loss.
>
> Is there any supported method of moving a cluster this way?
>
> Thanks in advance!
>
> --Tom
>

Re: Physically moving HDFS cluster to new

Posted by Ted Dunning <td...@maprtech.com>.
It may or may not help you in your current distress, but MapR's
distribution could handle this pretty easily.

One method is direct distcp between clusters, but you could also use MapR's
mirroring capabilities to migrate data.

You can also carry a MapR cluster, change the IP addresses and relight the
cluster without data loss.  You can also move disks (respecting RAID-0 disk
groups, of course) from machine to machine within a cluster and have them
wake up with all file and directory meta-data intact.

Furthermore, you can lose any two machines in a cluster and are guaranteed
to be able to reconstruct the cluster.  Even if you lose all three replicas
of *any* of the data or meta-data in the cluster, you can *still*
reconstruct any data volumes for which at least one copy survives.  The the
lost data volumes come back at a later time, you will also be able
resurrect the data correctly.

None of this is true for any of the other major Hadoop distributions.

Let me know if you want to try this out.





On Wed, Apr 17, 2013 at 5:20 PM, Tom Brown <to...@gmail.com> wrote:

> We have a situation where we want to physically move our small (4 node)
> cluster from one data center to another. As part of this move, each node
> will receive both a new FQN and a new IP address. As I understand it, HDFS
> is somehow tied to the the FQN or IP address, and changing them causes data
> loss.
>
> Is there any supported method of moving a cluster this way?
>
> Thanks in advance!
>
> --Tom
>