You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Patrick Schless <pa...@tempo-db.com> on 2012/07/06 01:44:16 UTC

Migrating Clusters - Broken Metadata

I have an existing hbase cluster (old.domain.com) and I am trying to
migrate the data to a new set of boxes (new.domain.com). Both are running
hbase 0.90.x.

I would like to minimize downtime, so I'm looking at the Backup tool from
mozilla (
http://blog.mozilla.org/data/2011/02/04/migrating-hbase-in-the-trenches/)
for doing incremental syncs, then a final sync and fail-over.

I am able to do the sync, but when I try to bring up the new hmaster, I get
errors along the lines of:
2012-07-05 23:26:27,674 ERROR org.apache.hadoop.hbase.HServerAddress: Could
not resolve the DNS name of data02.old.domain.com
2012-07-05 23:26:27,675 FATAL org.apache.hadoop.hbase.master.HMaster:
Unhandled exception. Starting shutdown.
java.lang.IllegalArgumentException: hostname can't be null

        at java.net.InetSocketAddress.<init>(InetSocketAddress.java:139)

        at
org.apache.hadoop.hbase.HServerAddress.getResolvedAddress(HServerAddress.java:108)

        at
org.apache.hadoop.hbase.HServerAddress.<init>(HServerAddress.java:64)

        at
org.apache.hadoop.hbase.catalog.MetaReader.readLocation(MetaReader.java:344)

        at
org.apache.hadoop.hbase.catalog.MetaReader.readMetaLocation(MetaReader.java:286)

        at
org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:338)

        at
org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:540)
        at
org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:474)

        at
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:412)

        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:301)


Before that error, I do see the new.domain.com region servers register
successfully with the hmaster.

I tried (temporarily) adding the data*.old.domain.com to /etc/hosts, but
that did not fix the issue. [1]

In the comments, the post (linked above) says hbase will figure out that
the region servers have changed, but that doesn't seem to be the case. Is
there something I can do to fix the meta data after copying the files? Is
there some other tool/strategy I should be using for this? I would love to
use the replication, but it's documented as not-production-ready in 0.90 so
I'm avoiding it.

Thanks,
Patrick

[1] with /etc/hosts entries for data*.old.domain.com
2012-07-05 23:41:57,130 INFO org.apache.hadoop.hbase.master.HMaster: -ROOT-
assigned=0, rit=false, location=data02.new.domain.com:60020
2012-07-05 23:41:57,148 INFO org.apache.hadoop.hbase.master.HMaster: .META.
assigned=0, rit=false, location=data02.old.domain.com:60020
2012-07-05 23:41:57,148 INFO org.apache.hadoop.hbase.master.HMaster: Master
startup proceeding: master failover
2012-07-05 23:41:57,150 FATAL org.apache.hadoop.hbase.master.HMaster:
Unhandled exception. Starting shutdown.
java.lang.NullPointerException

        at
org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:761)

        at
org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:227)

        at
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:425)

        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:301)

Re: Migrating Clusters - Broken Metadata

Posted by Jean-Daniel Cryans <jd...@apache.org>.
On Fri, Jul 6, 2012 at 1:52 PM, Patrick Schless <pa...@tempo-db.com> wrote:
>
> I still have no idea where the new hbase is getting the references to the
> old nodes..

It's coming from .META. where the old assignment is retained.

>
> Filed a bug report: https://issues.apache.org/jira/browse/HBASE-6343

I added my observations, this bug regarding the
IllegalArgumentException is only in 0.90

J-D

Re: Migrating Clusters - Broken Metadata

Posted by Patrick Schless <pa...@tempo-db.com>.
For the archives...



I tried a ton of stuff to fix this issue, and never was able to find out
where the references to the old node were coming from (I deleted /hbase
from zookeeper, delete -ROOT- and .META., did a clean shutdown of the
source before the copy, etc..  always the copied data would reach out to
the old nodes).

The hack to work around it was to add entries for the old data nodes in
/etc/hosts, pointing to 127.0.0.2, as in:
127.0.0.2 data01.old.domain.com data02.old.domain.com data03.old.domain.com

After bringing up the new hbase (and letting it fail after trying to
connect to the old nodes), I was able to remove the /etc/hosts entries and
bounce hbase without any problem

I still have no idea where the new hbase is getting the references to the
old nodes..

Filed a bug report: https://issues.apache.org/jira/browse/HBASE-6343

On Thu, Jul 5, 2012 at 6:44 PM, Patrick Schless <pa...@tempo-db.com>wrote:

> I have an existing hbase cluster (old.domain.com) and I am trying to
> migrate the data to a new set of boxes (new.domain.com). Both are running
> hbase 0.90.x.
>
> I would like to minimize downtime, so I'm looking at the Backup tool from
> mozilla (
> http://blog.mozilla.org/data/2011/02/04/migrating-hbase-in-the-trenches/)
> for doing incremental syncs, then a final sync and fail-over.
>
> I am able to do the sync, but when I try to bring up the new hmaster, I
> get errors along the lines of:
> 2012-07-05 23:26:27,674 ERROR org.apache.hadoop.hbase.HServerAddress:
> Could not resolve the DNS name of data02.old.domain.com
> 2012-07-05 23:26:27,675 FATAL org.apache.hadoop.hbase.master.HMaster:
> Unhandled exception. Starting shutdown.
> java.lang.IllegalArgumentException: hostname can't be null
>
>         at java.net.InetSocketAddress.<init>(InetSocketAddress.java:139)
>
>         at
> org.apache.hadoop.hbase.HServerAddress.getResolvedAddress(HServerAddress.java:108)
>
>         at
> org.apache.hadoop.hbase.HServerAddress.<init>(HServerAddress.java:64)
>
>         at
> org.apache.hadoop.hbase.catalog.MetaReader.readLocation(MetaReader.java:344)
>
>         at
> org.apache.hadoop.hbase.catalog.MetaReader.readMetaLocation(MetaReader.java:286)
>
>         at
> org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:338)
>
>         at
> org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:540)
>         at
> org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:474)
>
>         at
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:412)
>
>         at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:301)
>
>
> Before that error, I do see the new.domain.com region servers register
> successfully with the hmaster.
>
> I tried (temporarily) adding the data*.old.domain.com to /etc/hosts, but
> that did not fix the issue. [1]
>
> In the comments, the post (linked above) says hbase will figure out that
> the region servers have changed, but that doesn't seem to be the case. Is
> there something I can do to fix the meta data after copying the files? Is
> there some other tool/strategy I should be using for this? I would love to
> use the replication, but it's documented as not-production-ready in 0.90 so
> I'm avoiding it.
>
> Thanks,
> Patrick
>
> [1] with /etc/hosts entries for data*.old.domain.com
> 2012-07-05 23:41:57,130 INFO org.apache.hadoop.hbase.master.HMaster:
> -ROOT- assigned=0, rit=false, location=data02.new.domain.com:60020
> 2012-07-05 23:41:57,148 INFO org.apache.hadoop.hbase.master.HMaster:
> .META. assigned=0, rit=false, location=data02.old.domain.com:60020
> 2012-07-05 23:41:57,148 INFO org.apache.hadoop.hbase.master.HMaster:
> Master startup proceeding: master failover
> 2012-07-05 23:41:57,150 FATAL org.apache.hadoop.hbase.master.HMaster:
> Unhandled exception. Starting shutdown.
> java.lang.NullPointerException
>
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:761)
>
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:227)
>
>         at
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:425)
>
>         at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:301)
>
>