You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by James Kennedy <ja...@troove.net> on 2011/01/13 20:37:58 UTC

How to handle data migration?

I'm currently validating the new 0.90.0 RC3 with the hbase-trx layer and our own application.

All seems well so far except for the fact that I now find that HBase doesn't adapt if I try to run the same data on different machines.

e.g.
1) I work from home and generated our seeded test data.
2) Run the test suite and all tests pass
3) I go to the office and re-run the tests.

Result: HMaster fails because the .ROOT data has the wrong ip address for locating the .META. At least that is my understanding from the stacktrace below.  Note that the 192.168.1.102 IP address in that trace is the IP from my home network and is incorrect.

This wasn't an issue with previous versions of HBase as far as I've noticed.  And this seems to be a big data portability fail.
Surely the HMaster should be able to absorb stale metadata and wait for new region-servers to check in.
Instead it just keels over and dies.
But before logging a case I wanted to know if there was something I'm obviously missing or doing wrong.

The seeded test data is on HDFS.

Thoughts?


[13/01/11 10:58:42] 5939   [           main] INFO  ion.service.HBaseRegionService  - troove> Starting region server thread.
[13/01/11 11:00:15] 98699  [        HMaster] FATAL he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020]
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
	at $Proxy15.getProtocolVersion(Unknown Source)
	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
	at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)
	at java.lang.Thread.run(Thread.java:680)


James Kennedy
Troove Inc.



Re: How to handle data migration?

Posted by Stack <sa...@gmail.com>.
I will dig in Monday James.  If a cluster restart then deleting state up in zk is fine. The restart will run w/o previous state.  Deleting state from zk is bad if a running cluster. It will more than likely mess it up as regions in transition kept up in zk are erased 

Stack



On Jan 14, 2011, at 10:52, James Kennedy <ja...@troove.net> wrote:

> Negative. I deleted the zookeeper dir and HMaser still managed to pull the wrong IP address from somewhere.
> 
> I don't have a lot of time to really investigate this myself but I'll try to reproduce it with a basic test and log a case for it.
> 
> By the way,  can someone clarify the side-effects of deleting the zookeeper dir like that? I assume it has no ill effect on the data itself especially when the cluster is down. But what is the worst that can happen if you delete the dir while the cluster is running?
> 
> Thanks
> 
> James
> 
> On 2011-01-14, at 9:54 AM, Stack wrote:
> 
>> It does seem like a regression.   If u kill the zk data dir and restart the cluster does it work? (root location is up in zk)
>> 
>> 
>> Stack
>> 
>> 
>> 
>> On Jan 13, 2011, at 11:37, James Kennedy <ja...@troove.net> wrote:
>> 
>>> I'm currently validating the new 0.90.0 RC3 with the hbase-trx layer and our own application.
>>> 
>>> All seems well so far except for the fact that I now find that HBase doesn't adapt if I try to run the same data on different machines.
>>> 
>>> e.g.
>>> 1) I work from home and generated our seeded test data.
>>> 2) Run the test suite and all tests pass
>>> 3) I go to the office and re-run the tests.
>>> 
>>> Result: HMaster fails because the .ROOT data has the wrong ip address for locating the .META. At least that is my understanding from the stacktrace below.  Note that the 192.168.1.102 IP address in that trace is the IP from my home network and is incorrect.
>>> 
>>> This wasn't an issue with previous versions of HBase as far as I've noticed.  And this seems to be a big data portability fail.
>>> Surely the HMaster should be able to absorb stale metadata and wait for new region-servers to check in.
>>> Instead it just keels over and dies.
>>> But before logging a case I wanted to know if there was something I'm obviously missing or doing wrong.
>>> 
>>> The seeded test data is on HDFS.
>>> 
>>> Thoughts?
>>> 
>>> 
>>> [13/01/11 10:58:42] 5939   [           main] INFO  ion.service.HBaseRegionService  - troove> Starting region server thread.
>>> [13/01/11 11:00:15] 98699  [        HMaster] FATAL he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
>>> java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020]
>>>  at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
>>>  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
>>>  at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
>>>  at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
>>>  at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
>>>  at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
>>>  at $Proxy15.getProtocolVersion(Unknown Source)
>>>  at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
>>>  at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
>>>  at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
>>>  at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
>>>  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
>>>  at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
>>>  at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
>>>  at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
>>>  at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
>>>  at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
>>>  at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)
>>>  at java.lang.Thread.run(Thread.java:680)
>>> 
>>> 
>>> James Kennedy
>>> Troove Inc.
>>> 
>>> 
> 

Re: How to handle data migration?

Posted by James Kennedy <ja...@troove.net>.
Negative. I deleted the zookeeper dir and HMaser still managed to pull the wrong IP address from somewhere.

I don't have a lot of time to really investigate this myself but I'll try to reproduce it with a basic test and log a case for it.

By the way,  can someone clarify the side-effects of deleting the zookeeper dir like that? I assume it has no ill effect on the data itself especially when the cluster is down. But what is the worst that can happen if you delete the dir while the cluster is running?

Thanks

James

On 2011-01-14, at 9:54 AM, Stack wrote:

> It does seem like a regression.   If u kill the zk data dir and restart the cluster does it work? (root location is up in zk)
> 
> 
> Stack
> 
> 
> 
> On Jan 13, 2011, at 11:37, James Kennedy <ja...@troove.net> wrote:
> 
>> I'm currently validating the new 0.90.0 RC3 with the hbase-trx layer and our own application.
>> 
>> All seems well so far except for the fact that I now find that HBase doesn't adapt if I try to run the same data on different machines.
>> 
>> e.g.
>> 1) I work from home and generated our seeded test data.
>> 2) Run the test suite and all tests pass
>> 3) I go to the office and re-run the tests.
>> 
>> Result: HMaster fails because the .ROOT data has the wrong ip address for locating the .META. At least that is my understanding from the stacktrace below.  Note that the 192.168.1.102 IP address in that trace is the IP from my home network and is incorrect.
>> 
>> This wasn't an issue with previous versions of HBase as far as I've noticed.  And this seems to be a big data portability fail.
>> Surely the HMaster should be able to absorb stale metadata and wait for new region-servers to check in.
>> Instead it just keels over and dies.
>> But before logging a case I wanted to know if there was something I'm obviously missing or doing wrong.
>> 
>> The seeded test data is on HDFS.
>> 
>> Thoughts?
>> 
>> 
>> [13/01/11 10:58:42] 5939   [           main] INFO  ion.service.HBaseRegionService  - troove> Starting region server thread.
>> [13/01/11 11:00:15] 98699  [        HMaster] FATAL he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
>> java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020]
>>   at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
>>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
>>   at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
>>   at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
>>   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
>>   at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
>>   at $Proxy15.getProtocolVersion(Unknown Source)
>>   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
>>   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
>>   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
>>   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
>>   at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
>>   at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
>>   at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
>>   at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
>>   at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
>>   at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
>>   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)
>>   at java.lang.Thread.run(Thread.java:680)
>> 
>> 
>> James Kennedy
>> Troove Inc.
>> 
>> 


Re: How to handle data migration?

Posted by Stack <sa...@gmail.com>.
It does seem like a regression.   If u kill the zk data dir and restart the cluster does it work? (root location is up in zk)


Stack



On Jan 13, 2011, at 11:37, James Kennedy <ja...@troove.net> wrote:

> I'm currently validating the new 0.90.0 RC3 with the hbase-trx layer and our own application.
> 
> All seems well so far except for the fact that I now find that HBase doesn't adapt if I try to run the same data on different machines.
> 
> e.g.
> 1) I work from home and generated our seeded test data.
> 2) Run the test suite and all tests pass
> 3) I go to the office and re-run the tests.
> 
> Result: HMaster fails because the .ROOT data has the wrong ip address for locating the .META. At least that is my understanding from the stacktrace below.  Note that the 192.168.1.102 IP address in that trace is the IP from my home network and is incorrect.
> 
> This wasn't an issue with previous versions of HBase as far as I've noticed.  And this seems to be a big data portability fail.
> Surely the HMaster should be able to absorb stale metadata and wait for new region-servers to check in.
> Instead it just keels over and dies.
> But before logging a case I wanted to know if there was something I'm obviously missing or doing wrong.
> 
> The seeded test data is on HDFS.
> 
> Thoughts?
> 
> 
> [13/01/11 10:58:42] 5939   [           main] INFO  ion.service.HBaseRegionService  - troove> Starting region server thread.
> [13/01/11 11:00:15] 98699  [        HMaster] FATAL he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
> java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020]
>    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
>    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
>    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
>    at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
>    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
>    at $Proxy15.getProtocolVersion(Unknown Source)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
>    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
>    at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
>    at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
>    at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
>    at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
>    at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
>    at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)
>    at java.lang.Thread.run(Thread.java:680)
> 
> 
> James Kennedy
> Troove Inc.
> 
>