You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Liu Yan <gz...@gmail.com> on 2009/02/28 04:15:55 UTC

HBase (Master) Migration

hi,

We have a 4-node cluster Hadoop 0.19.0 and HBase 0.19.0. We run NameNode and
RegionServer on the same server and created a bunch of tables on HBase.

Now we want to use another (more powerful) machine to replace the old
master. Here is what we did:

1) Shutdown HBase and Hadoop
2) Copy all the Hadoop related files from the old master to the new master.
3) Re-configure the Hadoop and HBase so all (including the master and
clients) are now pointing to the new master.
4) Start the Hadoop cluster. (This seems fine).
5) Start the HBase cluster. (This seems fine too).

Then when we try to do a "count" in HBase shell, (e.g. count 'table_name'),
we hit the following problem:

09/02/27 21:53:04 INFO ipc.HBaseClass: Retrying connect to server: /
10.249.190.85:60020. Already tried 0 time(s).
09/02/27 21:53:05 INFO ipc.HBaseClass: Retrying connect to server: /
10.249.190.85:60020. Already tried 1 time(s).
09/02/27 21:53:06 INFO ipc.HBaseClass: Retrying connect to server: /
10.249.190.85:60020. Already tried 2 time(s).
09/02/27 21:53:06 INFO ipc.HbaseRPC: Server at /10.249.190.85:60020 not
available yet, Zzzzz...
09/02/27 21:53:06 INFO ipc.HbaseRPC: Server at /10.249.190.85:60020 could
not be reached after 1 tries, giving up.
09/02/27 21:53:09 INFO ipc.HBaseClass: Retrying connect to server: /
10.249.190.85:60020. Already tried 0 time(s).
09/02/27 21:53:10 INFO ipc.HBaseClass: Retrying connect to server: /
10.249.190.85:60020. Already tried 1 time(s).
09/02/27 21:53:11 INFO ipc.HBaseClass: Retrying connect to server: /
10.249.190.85:60020. Already tried 2 time(s).

The IP address showing here is actually the old master's IP address instead
of the new one's.

We tried "list" and "scan" commands in the HBase shell, both of them are
working good. Just the "count" reported the above error.

What's the problem here?

Thanks,
Yan

Re: HBase (Master) Migration

Posted by Liu Yan <gz...@gmail.com>.
I did stop and start HBase. The the "count" command seems automagically
working again (the counting not finished yet, but seems producing good
output). I don't think I did anything except:

1) Enable DEBUG per Stack's suggestion
2) After startup HBase, waited a little bit longer (since I was watching the
log file :-)

I didn't even see the old IP address appear in the log file, the only thing
caught my eyeball is this:

{{{
2009-02-27 23:46:29,158 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {regionname: .META.,,1,
startKey: <>, server: 10.254.51.127:60020}
2009-02-27 23:46:29,191 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
Current assignment of 1001_profiles,,1235713972403 is not valid;
serverInfo: address: 10.254.51.127:60020, startcode: 1235796330999, load:
(requests=0, regions=2, usedHeap=29, maxHeap=888), passed startCode:
1235789499358, storedInfo.startCode: 1235796330999
2009-02-27 23:46:29,194 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
Current assignment of 1001_profiles,113161088459795286,1235713972403 is not
valid;  serverInfo: address: 10.254.51.127:60020, startcode: 1235796330999,
load: (requests=0, regions=2, usedHeap=29, maxHeap=888), passed startCode:
1235789499358, storedInfo.startCode: 1235796330999
}}}

Is this meaning some blocks are invalid, and we need to wait for a while
until they can be properly replicated, then everything will be good again?

Thanks,
Yan

2009/2/28 stack <st...@duboce.net>

> Stop and start hbase.
>
> Watch the master log as it starts up.
>
> Try to figure why it is not judging regions that have the old server IPs as
> bad.
>
> Enable DEBUG before you restart.  The extra info might help (see FAQ on
> wiki
> for how).
>
> St.Ack
>
> On Fri, Feb 27, 2009 at 8:00 PM, Liu Yan <gz...@gmail.com> wrote:
>
> > When I do "scan '.META.'", I see some interesting output:
> >
> > {{{
> >  1002_profiles,7139226398444 column=info:server, timestamp=1235789710023,
> > value=10.254.51.127:60020
> >  3021,1235657605714
> >
> >  1002_profiles,7139226398444 column=info:serverstartcode,
> > timestamp=1235789710023, value=1235789499358
> >  3021,1235657605714
> >
> >  1002_profiles,7399192338534 column=historian:assignment,
> > timestamp=1235789558647, value=Region assigned to se
> >  9818,1235657605714          rver 10.254.51.127:60020
> >
> >  1002_profiles,7399192338534 column=historian:open,
> > timestamp=1235789577850,
> > value=Region opened on server : h
> >  9818,1235657605714          master
> > }}}
> >
> > The IP address here is correct, pointing to the new master's IP.
> >
> > But I also see the following:
> >
> > {{{
> >  1002_profiles,7399192338534 column=info:server, timestamp=1235789577848,
> > value=10.254.51.127:60020
> >  9818,1235657605714
> >
> >  1002_profiles,7399192338534 column=info:serverstartcode,
> > timestamp=1235789577848, value=1235789499358
> >  9818,1235657605714
> >
> >  1002_profiles,7572817158818 column=historian:assignment,
> > timestamp=1235297600858, value=Region assigned to se
> >  3981,1235242656324          rver 10.249.190.85:60020
> >
> >  1002_profiles,7572817158818 column=historian:open,
> > timestamp=1235297623082,
> > value=Region opened on server : h
> >  3981,1235242656324          master
> > }}}
> >
> > This is the IP of our old master's.
> >
> > How to fix this?
> >
> > Regards,
> > Yan
> >
> > 2009/2/28 stack <st...@duboce.net>
> >
> > > If scan is working, do 'scan ".META."'.
> > >
> > > There are three columns: info:regioninfo, info:serverstartcode, and
> > > info:server.
> > >
> > > What do you see for info:server?  New addresses or the old?
> > >
> > > On startup, hbase should be judging the content of .META. as sour and
> > > reassigning regions to the servers that have just registered; i.e.
> those
> > of
> > > the new addresses.
> > >
> > > St.Ack
> > >
> > >
> > > On Fri, Feb 27, 2009 at 7:15 PM, Liu Yan <gz...@gmail.com>
> wrote:
> > >
> > > > hi,
> > > >
> > > > We have a 4-node cluster Hadoop 0.19.0 and HBase 0.19.0. We run
> > NameNode
> > > > and
> > > > RegionServer on the same server and created a bunch of tables on
> HBase.
> > > >
> > > > Now we want to use another (more powerful) machine to replace the old
> > > > master. Here is what we did:
> > > >
> > > > 1) Shutdown HBase and Hadoop
> > > > 2) Copy all the Hadoop related files from the old master to the new
> > > master.
> > > > 3) Re-configure the Hadoop and HBase so all (including the master and
> > > > clients) are now pointing to the new master.
> > > > 4) Start the Hadoop cluster. (This seems fine).
> > > > 5) Start the HBase cluster. (This seems fine too).
> > > >
> > > > Then when we try to do a "count" in HBase shell, (e.g. count
> > > 'table_name'),
> > > > we hit the following problem:
> > > >
> > > > 09/02/27 21:53:04 INFO ipc.HBaseClass: Retrying connect to server: /
> > > > 10.249.190.85:60020. Already tried 0 time(s).
> > > > 09/02/27 21:53:05 INFO ipc.HBaseClass: Retrying connect to server: /
> > > > 10.249.190.85:60020. Already tried 1 time(s).
> > > > 09/02/27 21:53:06 INFO ipc.HBaseClass: Retrying connect to server: /
> > > > 10.249.190.85:60020. Already tried 2 time(s).
> > > > 09/02/27 21:53:06 INFO ipc.HbaseRPC: Server at /10.249.190.85:60020
> not
> > > > available yet, Zzzzz...
> > > > 09/02/27 21:53:06 INFO ipc.HbaseRPC: Server at /10.249.190.85:60020
> > could
> > > > not be reached after 1 tries, giving up.
> > > > 09/02/27 21:53:09 INFO ipc.HBaseClass: Retrying connect to server: /
> > > > 10.249.190.85:60020. Already tried 0 time(s).
> > > > 09/02/27 21:53:10 INFO ipc.HBaseClass: Retrying connect to server: /
> > > > 10.249.190.85:60020. Already tried 1 time(s).
> > > > 09/02/27 21:53:11 INFO ipc.HBaseClass: Retrying connect to server: /
> > > > 10.249.190.85:60020. Already tried 2 time(s).
> > > >
> > > > The IP address showing here is actually the old master's IP address
> > > instead
> > > > of the new one's.
> > > >
> > > > We tried "list" and "scan" commands in the HBase shell, both of them
> > are
> > > > working good. Just the "count" reported the above error.
> > > >
> > > > What's the problem here?
> > > >
> > > > Thanks,
> > > > Yan
> > > >
> > >
> >
>

Re: HBase (Master) Migration

Posted by stack <st...@duboce.net>.
Stop and start hbase.

Watch the master log as it starts up.

Try to figure why it is not judging regions that have the old server IPs as
bad.

Enable DEBUG before you restart.  The extra info might help (see FAQ on wiki
for how).

St.Ack

On Fri, Feb 27, 2009 at 8:00 PM, Liu Yan <gz...@gmail.com> wrote:

> When I do "scan '.META.'", I see some interesting output:
>
> {{{
>  1002_profiles,7139226398444 column=info:server, timestamp=1235789710023,
> value=10.254.51.127:60020
>  3021,1235657605714
>
>  1002_profiles,7139226398444 column=info:serverstartcode,
> timestamp=1235789710023, value=1235789499358
>  3021,1235657605714
>
>  1002_profiles,7399192338534 column=historian:assignment,
> timestamp=1235789558647, value=Region assigned to se
>  9818,1235657605714          rver 10.254.51.127:60020
>
>  1002_profiles,7399192338534 column=historian:open,
> timestamp=1235789577850,
> value=Region opened on server : h
>  9818,1235657605714          master
> }}}
>
> The IP address here is correct, pointing to the new master's IP.
>
> But I also see the following:
>
> {{{
>  1002_profiles,7399192338534 column=info:server, timestamp=1235789577848,
> value=10.254.51.127:60020
>  9818,1235657605714
>
>  1002_profiles,7399192338534 column=info:serverstartcode,
> timestamp=1235789577848, value=1235789499358
>  9818,1235657605714
>
>  1002_profiles,7572817158818 column=historian:assignment,
> timestamp=1235297600858, value=Region assigned to se
>  3981,1235242656324          rver 10.249.190.85:60020
>
>  1002_profiles,7572817158818 column=historian:open,
> timestamp=1235297623082,
> value=Region opened on server : h
>  3981,1235242656324          master
> }}}
>
> This is the IP of our old master's.
>
> How to fix this?
>
> Regards,
> Yan
>
> 2009/2/28 stack <st...@duboce.net>
>
> > If scan is working, do 'scan ".META."'.
> >
> > There are three columns: info:regioninfo, info:serverstartcode, and
> > info:server.
> >
> > What do you see for info:server?  New addresses or the old?
> >
> > On startup, hbase should be judging the content of .META. as sour and
> > reassigning regions to the servers that have just registered; i.e. those
> of
> > the new addresses.
> >
> > St.Ack
> >
> >
> > On Fri, Feb 27, 2009 at 7:15 PM, Liu Yan <gz...@gmail.com> wrote:
> >
> > > hi,
> > >
> > > We have a 4-node cluster Hadoop 0.19.0 and HBase 0.19.0. We run
> NameNode
> > > and
> > > RegionServer on the same server and created a bunch of tables on HBase.
> > >
> > > Now we want to use another (more powerful) machine to replace the old
> > > master. Here is what we did:
> > >
> > > 1) Shutdown HBase and Hadoop
> > > 2) Copy all the Hadoop related files from the old master to the new
> > master.
> > > 3) Re-configure the Hadoop and HBase so all (including the master and
> > > clients) are now pointing to the new master.
> > > 4) Start the Hadoop cluster. (This seems fine).
> > > 5) Start the HBase cluster. (This seems fine too).
> > >
> > > Then when we try to do a "count" in HBase shell, (e.g. count
> > 'table_name'),
> > > we hit the following problem:
> > >
> > > 09/02/27 21:53:04 INFO ipc.HBaseClass: Retrying connect to server: /
> > > 10.249.190.85:60020. Already tried 0 time(s).
> > > 09/02/27 21:53:05 INFO ipc.HBaseClass: Retrying connect to server: /
> > > 10.249.190.85:60020. Already tried 1 time(s).
> > > 09/02/27 21:53:06 INFO ipc.HBaseClass: Retrying connect to server: /
> > > 10.249.190.85:60020. Already tried 2 time(s).
> > > 09/02/27 21:53:06 INFO ipc.HbaseRPC: Server at /10.249.190.85:60020not
> > > available yet, Zzzzz...
> > > 09/02/27 21:53:06 INFO ipc.HbaseRPC: Server at /10.249.190.85:60020
> could
> > > not be reached after 1 tries, giving up.
> > > 09/02/27 21:53:09 INFO ipc.HBaseClass: Retrying connect to server: /
> > > 10.249.190.85:60020. Already tried 0 time(s).
> > > 09/02/27 21:53:10 INFO ipc.HBaseClass: Retrying connect to server: /
> > > 10.249.190.85:60020. Already tried 1 time(s).
> > > 09/02/27 21:53:11 INFO ipc.HBaseClass: Retrying connect to server: /
> > > 10.249.190.85:60020. Already tried 2 time(s).
> > >
> > > The IP address showing here is actually the old master's IP address
> > instead
> > > of the new one's.
> > >
> > > We tried "list" and "scan" commands in the HBase shell, both of them
> are
> > > working good. Just the "count" reported the above error.
> > >
> > > What's the problem here?
> > >
> > > Thanks,
> > > Yan
> > >
> >
>

Re: HBase (Master) Migration

Posted by Liu Yan <gz...@gmail.com>.
When I do "scan '.META.'", I see some interesting output:

{{{
 1002_profiles,7139226398444 column=info:server, timestamp=1235789710023,
value=10.254.51.127:60020
 3021,1235657605714

 1002_profiles,7139226398444 column=info:serverstartcode,
timestamp=1235789710023, value=1235789499358
 3021,1235657605714

 1002_profiles,7399192338534 column=historian:assignment,
timestamp=1235789558647, value=Region assigned to se
 9818,1235657605714          rver 10.254.51.127:60020

 1002_profiles,7399192338534 column=historian:open, timestamp=1235789577850,
value=Region opened on server : h
 9818,1235657605714          master
}}}

The IP address here is correct, pointing to the new master's IP.

But I also see the following:

{{{
 1002_profiles,7399192338534 column=info:server, timestamp=1235789577848,
value=10.254.51.127:60020
 9818,1235657605714

 1002_profiles,7399192338534 column=info:serverstartcode,
timestamp=1235789577848, value=1235789499358
 9818,1235657605714

 1002_profiles,7572817158818 column=historian:assignment,
timestamp=1235297600858, value=Region assigned to se
 3981,1235242656324          rver 10.249.190.85:60020

 1002_profiles,7572817158818 column=historian:open, timestamp=1235297623082,
value=Region opened on server : h
 3981,1235242656324          master
}}}

This is the IP of our old master's.

How to fix this?

Regards,
Yan

2009/2/28 stack <st...@duboce.net>

> If scan is working, do 'scan ".META."'.
>
> There are three columns: info:regioninfo, info:serverstartcode, and
> info:server.
>
> What do you see for info:server?  New addresses or the old?
>
> On startup, hbase should be judging the content of .META. as sour and
> reassigning regions to the servers that have just registered; i.e. those of
> the new addresses.
>
> St.Ack
>
>
> On Fri, Feb 27, 2009 at 7:15 PM, Liu Yan <gz...@gmail.com> wrote:
>
> > hi,
> >
> > We have a 4-node cluster Hadoop 0.19.0 and HBase 0.19.0. We run NameNode
> > and
> > RegionServer on the same server and created a bunch of tables on HBase.
> >
> > Now we want to use another (more powerful) machine to replace the old
> > master. Here is what we did:
> >
> > 1) Shutdown HBase and Hadoop
> > 2) Copy all the Hadoop related files from the old master to the new
> master.
> > 3) Re-configure the Hadoop and HBase so all (including the master and
> > clients) are now pointing to the new master.
> > 4) Start the Hadoop cluster. (This seems fine).
> > 5) Start the HBase cluster. (This seems fine too).
> >
> > Then when we try to do a "count" in HBase shell, (e.g. count
> 'table_name'),
> > we hit the following problem:
> >
> > 09/02/27 21:53:04 INFO ipc.HBaseClass: Retrying connect to server: /
> > 10.249.190.85:60020. Already tried 0 time(s).
> > 09/02/27 21:53:05 INFO ipc.HBaseClass: Retrying connect to server: /
> > 10.249.190.85:60020. Already tried 1 time(s).
> > 09/02/27 21:53:06 INFO ipc.HBaseClass: Retrying connect to server: /
> > 10.249.190.85:60020. Already tried 2 time(s).
> > 09/02/27 21:53:06 INFO ipc.HbaseRPC: Server at /10.249.190.85:60020 not
> > available yet, Zzzzz...
> > 09/02/27 21:53:06 INFO ipc.HbaseRPC: Server at /10.249.190.85:60020could
> > not be reached after 1 tries, giving up.
> > 09/02/27 21:53:09 INFO ipc.HBaseClass: Retrying connect to server: /
> > 10.249.190.85:60020. Already tried 0 time(s).
> > 09/02/27 21:53:10 INFO ipc.HBaseClass: Retrying connect to server: /
> > 10.249.190.85:60020. Already tried 1 time(s).
> > 09/02/27 21:53:11 INFO ipc.HBaseClass: Retrying connect to server: /
> > 10.249.190.85:60020. Already tried 2 time(s).
> >
> > The IP address showing here is actually the old master's IP address
> instead
> > of the new one's.
> >
> > We tried "list" and "scan" commands in the HBase shell, both of them are
> > working good. Just the "count" reported the above error.
> >
> > What's the problem here?
> >
> > Thanks,
> > Yan
> >
>

Re: HBase (Master) Migration

Posted by stack <st...@duboce.net>.
If scan is working, do 'scan ".META."'.

There are three columns: info:regioninfo, info:serverstartcode, and
info:server.

What do you see for info:server?  New addresses or the old?

On startup, hbase should be judging the content of .META. as sour and
reassigning regions to the servers that have just registered; i.e. those of
the new addresses.

St.Ack


On Fri, Feb 27, 2009 at 7:15 PM, Liu Yan <gz...@gmail.com> wrote:

> hi,
>
> We have a 4-node cluster Hadoop 0.19.0 and HBase 0.19.0. We run NameNode
> and
> RegionServer on the same server and created a bunch of tables on HBase.
>
> Now we want to use another (more powerful) machine to replace the old
> master. Here is what we did:
>
> 1) Shutdown HBase and Hadoop
> 2) Copy all the Hadoop related files from the old master to the new master.
> 3) Re-configure the Hadoop and HBase so all (including the master and
> clients) are now pointing to the new master.
> 4) Start the Hadoop cluster. (This seems fine).
> 5) Start the HBase cluster. (This seems fine too).
>
> Then when we try to do a "count" in HBase shell, (e.g. count 'table_name'),
> we hit the following problem:
>
> 09/02/27 21:53:04 INFO ipc.HBaseClass: Retrying connect to server: /
> 10.249.190.85:60020. Already tried 0 time(s).
> 09/02/27 21:53:05 INFO ipc.HBaseClass: Retrying connect to server: /
> 10.249.190.85:60020. Already tried 1 time(s).
> 09/02/27 21:53:06 INFO ipc.HBaseClass: Retrying connect to server: /
> 10.249.190.85:60020. Already tried 2 time(s).
> 09/02/27 21:53:06 INFO ipc.HbaseRPC: Server at /10.249.190.85:60020 not
> available yet, Zzzzz...
> 09/02/27 21:53:06 INFO ipc.HbaseRPC: Server at /10.249.190.85:60020 could
> not be reached after 1 tries, giving up.
> 09/02/27 21:53:09 INFO ipc.HBaseClass: Retrying connect to server: /
> 10.249.190.85:60020. Already tried 0 time(s).
> 09/02/27 21:53:10 INFO ipc.HBaseClass: Retrying connect to server: /
> 10.249.190.85:60020. Already tried 1 time(s).
> 09/02/27 21:53:11 INFO ipc.HBaseClass: Retrying connect to server: /
> 10.249.190.85:60020. Already tried 2 time(s).
>
> The IP address showing here is actually the old master's IP address instead
> of the new one's.
>
> We tried "list" and "scan" commands in the HBase shell, both of them are
> working good. Just the "count" reported the above error.
>
> What's the problem here?
>
> Thanks,
> Yan
>