You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "James Kennedy (JIRA)" <ji...@apache.org> on 2011/01/14 20:14:46 UTC

[jira] Created: (HBASE-3445) Master crashes on when data moved to different host

Master crashes on when data moved to different host
---------------------------------------------------

                 Key: HBASE-3445
                 URL: https://issues.apache.org/jira/browse/HBASE-3445
             Project: HBase
          Issue Type: Bug
          Components: master
    Affects Versions: 0.90.0
            Reporter: James Kennedy
            Priority: Critical
             Fix For: 0.90.0


While testing an upgrade to 0.90.0 RC3 I noticed that if I seeded our test data on one machine and transferred to another machine the HMaster on the new machine dies on startup.

Based on the following stack trace it looks as though it is attempting to find the .meta region with the ip address of the original machine.  Instead of waiting around for RegionServer's to register with new location data, HMaster throws it's hands up with a FATAL exception.

Note that deleting the zookeeper dir makes no difference.

Also note that so far I have only reproduced this in my own environment using the hbase-trx extension of HBase and an ApplicationStarter that starts the Master and RegionServer together in the same JVM.  While the issue seems likely isolated from those factors it is far from a vanilla HBase environment.

I will spend some time trying to reproduce the issue in a proper hbase test.  But perhaps someone can beat me to it?  How do I simulate the IP switch? May require a data.tar upload. 

[14/01/11 10:45:20] 6396   [     Thread-298] ERROR server.quorum.QuorumPeerConfig  - Invalid configuration, only one server specified (ignoring)
[14/01/11 10:45:21] 7178   [           main] INFO  ion.service.HBaseRegionService  - troove> region port:       60010
[14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> region interface:  org.apache.hadoop.hbase.ipc.IndexedRegionInterface
[14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> root dir: hdfs://localhost:8701/hbase
[14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> Initializing region server.
[14/01/11 10:45:21] 7631   [           main] INFO  ion.service.HBaseRegionService  - troove> Starting region server thread.
[14/01/11 10:46:54] 100764 [        HMaster] FATAL he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020]
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
	at $Proxy14.getProtocolVersion(Unknown Source)
	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
	at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-3445) Master crashes on data that was moved from different host

Posted by "James Kennedy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Kennedy updated HBASE-3445:
---------------------------------

    Fix Version/s:     (was: 0.90.1)
                   0.90.0

Actually, let me qualify that last statement.  By "swallow" i didn't mean to imply that the exceptions should be completely silent. In fact some WARN output in that CatalogTracker exception handling would make sense. 

Something like:

"Unable to connect to .meta region at 192.168.1.2:60020. Waiting for RegionServers to update location data."

> Master crashes on data that was moved from different host
> ---------------------------------------------------------
>
>                 Key: HBASE-3445
>                 URL: https://issues.apache.org/jira/browse/HBASE-3445
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: James Kennedy
>            Priority: Critical
>             Fix For: 0.90.0
>
>         Attachments: 3445_0.90.0.patch
>
>
> While testing an upgrade to 0.90.0 RC3 I noticed that if I seeded our test data on one machine and transferred to another machine the HMaster on the new machine dies on startup.
> Based on the following stack trace it looks as though it is attempting to find the .meta region with the ip address of the original machine.  Instead of waiting around for RegionServer's to register with new location data, HMaster throws it's hands up with a FATAL exception.
> Note that deleting the zookeeper dir makes no difference.
> Also note that so far I have only reproduced this in my own environment using the hbase-trx extension of HBase and an ApplicationStarter that starts the Master and RegionServer together in the same JVM.  While the issue seems likely isolated from those factors it is far from a vanilla HBase environment.
> I will spend some time trying to reproduce the issue in a proper hbase test.  But perhaps someone can beat me to it?  How do I simulate the IP switch? May require a data.tar upload. 
> [14/01/11 10:45:20] 6396   [     Thread-298] ERROR server.quorum.QuorumPeerConfig  - Invalid configuration, only one server specified (ignoring)
> [14/01/11 10:45:21] 7178   [           main] INFO  ion.service.HBaseRegionService  - troove> region port:       60010
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> region interface:  org.apache.hadoop.hbase.ipc.IndexedRegionInterface
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> root dir: hdfs://localhost:8701/hbase
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> Initializing region server.
> [14/01/11 10:45:21] 7631   [           main] INFO  ion.service.HBaseRegionService  - troove> Starting region server thread.
> [14/01/11 10:46:54] 100764 [        HMaster] FATAL he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
> java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020]
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
> 	at $Proxy14.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
> 	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
> 	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3445) Master crashes on data that was moved from different host

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982756#action_12982756 ] 

stack commented on HBASE-3445:
------------------------------

Yeah.  It starts to tend that direction James.  I think the set that is over in AM is pretty good -- its more prone to failures that the bit of code you've been massaging.  Let me commit your patch.

> Master crashes on data that was moved from different host
> ---------------------------------------------------------
>
>                 Key: HBASE-3445
>                 URL: https://issues.apache.org/jira/browse/HBASE-3445
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: James Kennedy
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.90.1
>
>         Attachments: 3445_0.90.0.patch
>
>
> While testing an upgrade to 0.90.0 RC3 I noticed that if I seeded our test data on one machine and transferred to another machine the HMaster on the new machine dies on startup.
> Based on the following stack trace it looks as though it is attempting to find the .meta region with the ip address of the original machine.  Instead of waiting around for RegionServer's to register with new location data, HMaster throws it's hands up with a FATAL exception.
> Note that deleting the zookeeper dir makes no difference.
> Also note that so far I have only reproduced this in my own environment using the hbase-trx extension of HBase and an ApplicationStarter that starts the Master and RegionServer together in the same JVM.  While the issue seems likely isolated from those factors it is far from a vanilla HBase environment.
> I will spend some time trying to reproduce the issue in a proper hbase test.  But perhaps someone can beat me to it?  How do I simulate the IP switch? May require a data.tar upload. 
> [14/01/11 10:45:20] 6396   [     Thread-298] ERROR server.quorum.QuorumPeerConfig  - Invalid configuration, only one server specified (ignoring)
> [14/01/11 10:45:21] 7178   [           main] INFO  ion.service.HBaseRegionService  - troove> region port:       60010
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> region interface:  org.apache.hadoop.hbase.ipc.IndexedRegionInterface
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> root dir: hdfs://localhost:8701/hbase
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> Initializing region server.
> [14/01/11 10:45:21] 7631   [           main] INFO  ion.service.HBaseRegionService  - troove> Starting region server thread.
> [14/01/11 10:46:54] 100764 [        HMaster] FATAL he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
> java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020]
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
> 	at $Proxy14.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
> 	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
> 	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-3445) Master crashes on data that was moved from different host

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ryan rawson updated HBASE-3445:
-------------------------------

    Fix Version/s:     (was: 0.90.0)
                   0.90.1

> Master crashes on data that was moved from different host
> ---------------------------------------------------------
>
>                 Key: HBASE-3445
>                 URL: https://issues.apache.org/jira/browse/HBASE-3445
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: James Kennedy
>            Priority: Critical
>             Fix For: 0.90.1
>
>         Attachments: 3445_0.90.0.patch
>
>
> While testing an upgrade to 0.90.0 RC3 I noticed that if I seeded our test data on one machine and transferred to another machine the HMaster on the new machine dies on startup.
> Based on the following stack trace it looks as though it is attempting to find the .meta region with the ip address of the original machine.  Instead of waiting around for RegionServer's to register with new location data, HMaster throws it's hands up with a FATAL exception.
> Note that deleting the zookeeper dir makes no difference.
> Also note that so far I have only reproduced this in my own environment using the hbase-trx extension of HBase and an ApplicationStarter that starts the Master and RegionServer together in the same JVM.  While the issue seems likely isolated from those factors it is far from a vanilla HBase environment.
> I will spend some time trying to reproduce the issue in a proper hbase test.  But perhaps someone can beat me to it?  How do I simulate the IP switch? May require a data.tar upload. 
> [14/01/11 10:45:20] 6396   [     Thread-298] ERROR server.quorum.QuorumPeerConfig  - Invalid configuration, only one server specified (ignoring)
> [14/01/11 10:45:21] 7178   [           main] INFO  ion.service.HBaseRegionService  - troove> region port:       60010
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> region interface:  org.apache.hadoop.hbase.ipc.IndexedRegionInterface
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> root dir: hdfs://localhost:8701/hbase
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> Initializing region server.
> [14/01/11 10:45:21] 7631   [           main] INFO  ion.service.HBaseRegionService  - troove> Starting region server thread.
> [14/01/11 10:46:54] 100764 [        HMaster] FATAL he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
> java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020]
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
> 	at $Proxy14.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
> 	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
> 	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-3445) Master crashes on when data moved to different host

Posted by "James Kennedy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Kennedy updated HBASE-3445:
---------------------------------


Instead of wrestling with a test I did some debugging.  I can fix this issue with the attached patch.
I'll leave it up to you guys to decide if that's the right fix or if there are more exceptions to be considered, etc.

But from my narrow scope of understanding it just seems that the CatalogTracker SHOULD swallow exceptions like SocketTimeoutException instead of throwing them up.


> Master crashes on when data moved to different host
> ---------------------------------------------------
>
>                 Key: HBASE-3445
>                 URL: https://issues.apache.org/jira/browse/HBASE-3445
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: James Kennedy
>            Priority: Critical
>             Fix For: 0.90.0
>
>
> While testing an upgrade to 0.90.0 RC3 I noticed that if I seeded our test data on one machine and transferred to another machine the HMaster on the new machine dies on startup.
> Based on the following stack trace it looks as though it is attempting to find the .meta region with the ip address of the original machine.  Instead of waiting around for RegionServer's to register with new location data, HMaster throws it's hands up with a FATAL exception.
> Note that deleting the zookeeper dir makes no difference.
> Also note that so far I have only reproduced this in my own environment using the hbase-trx extension of HBase and an ApplicationStarter that starts the Master and RegionServer together in the same JVM.  While the issue seems likely isolated from those factors it is far from a vanilla HBase environment.
> I will spend some time trying to reproduce the issue in a proper hbase test.  But perhaps someone can beat me to it?  How do I simulate the IP switch? May require a data.tar upload. 
> [14/01/11 10:45:20] 6396   [     Thread-298] ERROR server.quorum.QuorumPeerConfig  - Invalid configuration, only one server specified (ignoring)
> [14/01/11 10:45:21] 7178   [           main] INFO  ion.service.HBaseRegionService  - troove> region port:       60010
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> region interface:  org.apache.hadoop.hbase.ipc.IndexedRegionInterface
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> root dir: hdfs://localhost:8701/hbase
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> Initializing region server.
> [14/01/11 10:45:21] 7631   [           main] INFO  ion.service.HBaseRegionService  - troove> Starting region server thread.
> [14/01/11 10:46:54] 100764 [        HMaster] FATAL he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
> java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020]
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
> 	at $Proxy14.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
> 	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
> 	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-3445) Master crashes on data that was moved from different host

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3445:
-------------------------

    Attachment: 3445-refactor.txt

I started in on a refactor of AM#unassign to move all of the try/catch out to a Class that could be reused in such as the CatalogTracker around this getCachedConnection where James ran into his issue.  Turns out, this is wrong direction; the two locations have different exception throwing character.  I'm abandoning this tack.  Attaching patch anyway.  Let me write a unit test to repro the James case to go along w/ his patch and see if I can gen other exceptions at the getCachedConnection juncture.

> Master crashes on data that was moved from different host
> ---------------------------------------------------------
>
>                 Key: HBASE-3445
>                 URL: https://issues.apache.org/jira/browse/HBASE-3445
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: James Kennedy
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.90.1
>
>         Attachments: 3445-refactor.txt, 3445_0.90.0.patch
>
>
> While testing an upgrade to 0.90.0 RC3 I noticed that if I seeded our test data on one machine and transferred to another machine the HMaster on the new machine dies on startup.
> Based on the following stack trace it looks as though it is attempting to find the .meta region with the ip address of the original machine.  Instead of waiting around for RegionServer's to register with new location data, HMaster throws it's hands up with a FATAL exception.
> Note that deleting the zookeeper dir makes no difference.
> Also note that so far I have only reproduced this in my own environment using the hbase-trx extension of HBase and an ApplicationStarter that starts the Master and RegionServer together in the same JVM.  While the issue seems likely isolated from those factors it is far from a vanilla HBase environment.
> I will spend some time trying to reproduce the issue in a proper hbase test.  But perhaps someone can beat me to it?  How do I simulate the IP switch? May require a data.tar upload. 
> [14/01/11 10:45:20] 6396   [     Thread-298] ERROR server.quorum.QuorumPeerConfig  - Invalid configuration, only one server specified (ignoring)
> [14/01/11 10:45:21] 7178   [           main] INFO  ion.service.HBaseRegionService  - troove> region port:       60010
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> region interface:  org.apache.hadoop.hbase.ipc.IndexedRegionInterface
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> root dir: hdfs://localhost:8701/hbase
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> Initializing region server.
> [14/01/11 10:45:21] 7631   [           main] INFO  ion.service.HBaseRegionService  - troove> Starting region server thread.
> [14/01/11 10:46:54] 100764 [        HMaster] FATAL he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
> java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020]
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
> 	at $Proxy14.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
> 	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
> 	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3445) Master crashes on data that was moved from different host

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981940#action_12981940 ] 

ryan rawson commented on HBASE-3445:
------------------------------------

thanks for the good debugging work. I'm going to place this in 0.90.1, and someone will review it soon.



> Master crashes on data that was moved from different host
> ---------------------------------------------------------
>
>                 Key: HBASE-3445
>                 URL: https://issues.apache.org/jira/browse/HBASE-3445
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: James Kennedy
>            Priority: Critical
>             Fix For: 0.90.1
>
>         Attachments: 3445_0.90.0.patch
>
>
> While testing an upgrade to 0.90.0 RC3 I noticed that if I seeded our test data on one machine and transferred to another machine the HMaster on the new machine dies on startup.
> Based on the following stack trace it looks as though it is attempting to find the .meta region with the ip address of the original machine.  Instead of waiting around for RegionServer's to register with new location data, HMaster throws it's hands up with a FATAL exception.
> Note that deleting the zookeeper dir makes no difference.
> Also note that so far I have only reproduced this in my own environment using the hbase-trx extension of HBase and an ApplicationStarter that starts the Master and RegionServer together in the same JVM.  While the issue seems likely isolated from those factors it is far from a vanilla HBase environment.
> I will spend some time trying to reproduce the issue in a proper hbase test.  But perhaps someone can beat me to it?  How do I simulate the IP switch? May require a data.tar upload. 
> [14/01/11 10:45:20] 6396   [     Thread-298] ERROR server.quorum.QuorumPeerConfig  - Invalid configuration, only one server specified (ignoring)
> [14/01/11 10:45:21] 7178   [           main] INFO  ion.service.HBaseRegionService  - troove> region port:       60010
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> region interface:  org.apache.hadoop.hbase.ipc.IndexedRegionInterface
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> root dir: hdfs://localhost:8701/hbase
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> Initializing region server.
> [14/01/11 10:45:21] 7631   [           main] INFO  ion.service.HBaseRegionService  - troove> Starting region server thread.
> [14/01/11 10:46:54] 100764 [        HMaster] FATAL he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
> java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020]
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
> 	at $Proxy14.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
> 	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
> 	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3445) Master crashes on data that was moved from different host

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982420#action_12982420 ] 

stack commented on HBASE-3445:
------------------------------

James:

In the AssignmentManager, where we go to RPC to a remote regionserver, we do following:

{code}
    } catch (ConnectException e) {
      LOG.info("Failed connect to " + server + ", message=" + e.getMessage() +
        ", region=" + region.getEncodedName());
      // Presume that regionserver just failed and we haven't got expired
      // server from zk yet.  Let expired server deal with clean up.
    } catch (java.net.SocketTimeoutException e) {
      LOG.info("Server " + server + " returned " + e.getMessage() + " for " +
        region.getEncodedName());
      // Presume retry or server will expire.
    } catch (EOFException e) {
      LOG.info("Server " + server + " returned " + e.getMessage() + " for " +
        region.getEncodedName());
      // Presume retry or server will expire.
    } catch (RemoteException re) {
      IOException ioe = re.unwrapRemoteException();
      if (ioe instanceof NotServingRegionException) {
        // Failed to close, so pass through and reassign
        LOG.debug("Server " + server + " returned " + ioe + " for " +
          region.getEncodedName());
      } else if (ioe instanceof EOFException) {
        // Failed to close, so pass through and reassign
        LOG.debug("Server " + server + " returned " + ioe + " for " +
          region.getEncodedName());
      } else {
        this.master.abort("Remote unexpected exception", ioe);
      }
    } catch (Throwable t) {
{code}

I think your adding of timeout to the try/catch in the getCachedConnection is right.  Maybe we should add the ConnectException too? Unless you object, I'll add it when I commit your patch.

> Master crashes on data that was moved from different host
> ---------------------------------------------------------
>
>                 Key: HBASE-3445
>                 URL: https://issues.apache.org/jira/browse/HBASE-3445
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: James Kennedy
>            Priority: Critical
>             Fix For: 0.90.0
>
>         Attachments: 3445_0.90.0.patch
>
>
> While testing an upgrade to 0.90.0 RC3 I noticed that if I seeded our test data on one machine and transferred to another machine the HMaster on the new machine dies on startup.
> Based on the following stack trace it looks as though it is attempting to find the .meta region with the ip address of the original machine.  Instead of waiting around for RegionServer's to register with new location data, HMaster throws it's hands up with a FATAL exception.
> Note that deleting the zookeeper dir makes no difference.
> Also note that so far I have only reproduced this in my own environment using the hbase-trx extension of HBase and an ApplicationStarter that starts the Master and RegionServer together in the same JVM.  While the issue seems likely isolated from those factors it is far from a vanilla HBase environment.
> I will spend some time trying to reproduce the issue in a proper hbase test.  But perhaps someone can beat me to it?  How do I simulate the IP switch? May require a data.tar upload. 
> [14/01/11 10:45:20] 6396   [     Thread-298] ERROR server.quorum.QuorumPeerConfig  - Invalid configuration, only one server specified (ignoring)
> [14/01/11 10:45:21] 7178   [           main] INFO  ion.service.HBaseRegionService  - troove> region port:       60010
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> region interface:  org.apache.hadoop.hbase.ipc.IndexedRegionInterface
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> root dir: hdfs://localhost:8701/hbase
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> Initializing region server.
> [14/01/11 10:45:21] 7631   [           main] INFO  ion.service.HBaseRegionService  - troove> Starting region server thread.
> [14/01/11 10:46:54] 100764 [        HMaster] FATAL he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
> java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020]
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
> 	at $Proxy14.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
> 	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
> 	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HBASE-3445) Master crashes on data that was moved from different host

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack reassigned HBASE-3445:
----------------------------

    Assignee: James Kennedy

Made James a contributor and assigned him this issue

> Master crashes on data that was moved from different host
> ---------------------------------------------------------
>
>                 Key: HBASE-3445
>                 URL: https://issues.apache.org/jira/browse/HBASE-3445
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>            Priority: Critical
>             Fix For: 0.90.1
>
>         Attachments: 3445_0.90.0.patch
>
>
> While testing an upgrade to 0.90.0 RC3 I noticed that if I seeded our test data on one machine and transferred to another machine the HMaster on the new machine dies on startup.
> Based on the following stack trace it looks as though it is attempting to find the .meta region with the ip address of the original machine.  Instead of waiting around for RegionServer's to register with new location data, HMaster throws it's hands up with a FATAL exception.
> Note that deleting the zookeeper dir makes no difference.
> Also note that so far I have only reproduced this in my own environment using the hbase-trx extension of HBase and an ApplicationStarter that starts the Master and RegionServer together in the same JVM.  While the issue seems likely isolated from those factors it is far from a vanilla HBase environment.
> I will spend some time trying to reproduce the issue in a proper hbase test.  But perhaps someone can beat me to it?  How do I simulate the IP switch? May require a data.tar upload. 
> [14/01/11 10:45:20] 6396   [     Thread-298] ERROR server.quorum.QuorumPeerConfig  - Invalid configuration, only one server specified (ignoring)
> [14/01/11 10:45:21] 7178   [           main] INFO  ion.service.HBaseRegionService  - troove> region port:       60010
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> region interface:  org.apache.hadoop.hbase.ipc.IndexedRegionInterface
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> root dir: hdfs://localhost:8701/hbase
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> Initializing region server.
> [14/01/11 10:45:21] 7631   [           main] INFO  ion.service.HBaseRegionService  - troove> Starting region server thread.
> [14/01/11 10:46:54] 100764 [        HMaster] FATAL he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
> java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020]
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
> 	at $Proxy14.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
> 	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
> 	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-3445) Master crashes on data that was moved from different host

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3445:
-------------------------

    Fix Version/s:     (was: 0.90.0)
                   0.90.1

Moved to 0.90.1

> Master crashes on data that was moved from different host
> ---------------------------------------------------------
>
>                 Key: HBASE-3445
>                 URL: https://issues.apache.org/jira/browse/HBASE-3445
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: James Kennedy
>            Priority: Critical
>             Fix For: 0.90.1
>
>         Attachments: 3445_0.90.0.patch
>
>
> While testing an upgrade to 0.90.0 RC3 I noticed that if I seeded our test data on one machine and transferred to another machine the HMaster on the new machine dies on startup.
> Based on the following stack trace it looks as though it is attempting to find the .meta region with the ip address of the original machine.  Instead of waiting around for RegionServer's to register with new location data, HMaster throws it's hands up with a FATAL exception.
> Note that deleting the zookeeper dir makes no difference.
> Also note that so far I have only reproduced this in my own environment using the hbase-trx extension of HBase and an ApplicationStarter that starts the Master and RegionServer together in the same JVM.  While the issue seems likely isolated from those factors it is far from a vanilla HBase environment.
> I will spend some time trying to reproduce the issue in a proper hbase test.  But perhaps someone can beat me to it?  How do I simulate the IP switch? May require a data.tar upload. 
> [14/01/11 10:45:20] 6396   [     Thread-298] ERROR server.quorum.QuorumPeerConfig  - Invalid configuration, only one server specified (ignoring)
> [14/01/11 10:45:21] 7178   [           main] INFO  ion.service.HBaseRegionService  - troove> region port:       60010
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> region interface:  org.apache.hadoop.hbase.ipc.IndexedRegionInterface
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> root dir: hdfs://localhost:8701/hbase
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> Initializing region server.
> [14/01/11 10:45:21] 7631   [           main] INFO  ion.service.HBaseRegionService  - troove> Starting region server thread.
> [14/01/11 10:46:54] 100764 [        HMaster] FATAL he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
> java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020]
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
> 	at $Proxy14.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
> 	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
> 	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-3445) Master crashes on when data moved to different host

Posted by "James Kennedy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Kennedy updated HBASE-3445:
---------------------------------

    Attachment: 3445_0.90.0.patch

> Master crashes on when data moved to different host
> ---------------------------------------------------
>
>                 Key: HBASE-3445
>                 URL: https://issues.apache.org/jira/browse/HBASE-3445
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: James Kennedy
>            Priority: Critical
>             Fix For: 0.90.0
>
>         Attachments: 3445_0.90.0.patch
>
>
> While testing an upgrade to 0.90.0 RC3 I noticed that if I seeded our test data on one machine and transferred to another machine the HMaster on the new machine dies on startup.
> Based on the following stack trace it looks as though it is attempting to find the .meta region with the ip address of the original machine.  Instead of waiting around for RegionServer's to register with new location data, HMaster throws it's hands up with a FATAL exception.
> Note that deleting the zookeeper dir makes no difference.
> Also note that so far I have only reproduced this in my own environment using the hbase-trx extension of HBase and an ApplicationStarter that starts the Master and RegionServer together in the same JVM.  While the issue seems likely isolated from those factors it is far from a vanilla HBase environment.
> I will spend some time trying to reproduce the issue in a proper hbase test.  But perhaps someone can beat me to it?  How do I simulate the IP switch? May require a data.tar upload. 
> [14/01/11 10:45:20] 6396   [     Thread-298] ERROR server.quorum.QuorumPeerConfig  - Invalid configuration, only one server specified (ignoring)
> [14/01/11 10:45:21] 7178   [           main] INFO  ion.service.HBaseRegionService  - troove> region port:       60010
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> region interface:  org.apache.hadoop.hbase.ipc.IndexedRegionInterface
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> root dir: hdfs://localhost:8701/hbase
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> Initializing region server.
> [14/01/11 10:45:21] 7631   [           main] INFO  ion.service.HBaseRegionService  - troove> Starting region server thread.
> [14/01/11 10:46:54] 100764 [        HMaster] FATAL he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
> java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020]
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
> 	at $Proxy14.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
> 	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
> 	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HBASE-3445) Master crashes on data that was moved from different host

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-3445.
--------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

Committed to branch and trunk.  Thanks for the patch James.

> Master crashes on data that was moved from different host
> ---------------------------------------------------------
>
>                 Key: HBASE-3445
>                 URL: https://issues.apache.org/jira/browse/HBASE-3445
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: James Kennedy
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.90.1
>
>         Attachments: 3445-refactor.txt, 3445-v2.txt, 3445_0.90.0.patch
>
>
> While testing an upgrade to 0.90.0 RC3 I noticed that if I seeded our test data on one machine and transferred to another machine the HMaster on the new machine dies on startup.
> Based on the following stack trace it looks as though it is attempting to find the .meta region with the ip address of the original machine.  Instead of waiting around for RegionServer's to register with new location data, HMaster throws it's hands up with a FATAL exception.
> Note that deleting the zookeeper dir makes no difference.
> Also note that so far I have only reproduced this in my own environment using the hbase-trx extension of HBase and an ApplicationStarter that starts the Master and RegionServer together in the same JVM.  While the issue seems likely isolated from those factors it is far from a vanilla HBase environment.
> I will spend some time trying to reproduce the issue in a proper hbase test.  But perhaps someone can beat me to it?  How do I simulate the IP switch? May require a data.tar upload. 
> [14/01/11 10:45:20] 6396   [     Thread-298] ERROR server.quorum.QuorumPeerConfig  - Invalid configuration, only one server specified (ignoring)
> [14/01/11 10:45:21] 7178   [           main] INFO  ion.service.HBaseRegionService  - troove> region port:       60010
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> region interface:  org.apache.hadoop.hbase.ipc.IndexedRegionInterface
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> root dir: hdfs://localhost:8701/hbase
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> Initializing region server.
> [14/01/11 10:45:21] 7631   [           main] INFO  ion.service.HBaseRegionService  - troove> Starting region server thread.
> [14/01/11 10:46:54] 100764 [        HMaster] FATAL he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
> java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020]
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
> 	at $Proxy14.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
> 	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
> 	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-3445) Master crashes on data that was moved from different host

Posted by "James Kennedy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Kennedy updated HBASE-3445:
---------------------------------

    Assignee: stack  (was: James Kennedy)

Yeah probably.  I wonder if a better question is "what exceptions do we NOT want to catch so that master dies with a FATAL?"


> Master crashes on data that was moved from different host
> ---------------------------------------------------------
>
>                 Key: HBASE-3445
>                 URL: https://issues.apache.org/jira/browse/HBASE-3445
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: James Kennedy
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.90.1
>
>         Attachments: 3445_0.90.0.patch
>
>
> While testing an upgrade to 0.90.0 RC3 I noticed that if I seeded our test data on one machine and transferred to another machine the HMaster on the new machine dies on startup.
> Based on the following stack trace it looks as though it is attempting to find the .meta region with the ip address of the original machine.  Instead of waiting around for RegionServer's to register with new location data, HMaster throws it's hands up with a FATAL exception.
> Note that deleting the zookeeper dir makes no difference.
> Also note that so far I have only reproduced this in my own environment using the hbase-trx extension of HBase and an ApplicationStarter that starts the Master and RegionServer together in the same JVM.  While the issue seems likely isolated from those factors it is far from a vanilla HBase environment.
> I will spend some time trying to reproduce the issue in a proper hbase test.  But perhaps someone can beat me to it?  How do I simulate the IP switch? May require a data.tar upload. 
> [14/01/11 10:45:20] 6396   [     Thread-298] ERROR server.quorum.QuorumPeerConfig  - Invalid configuration, only one server specified (ignoring)
> [14/01/11 10:45:21] 7178   [           main] INFO  ion.service.HBaseRegionService  - troove> region port:       60010
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> region interface:  org.apache.hadoop.hbase.ipc.IndexedRegionInterface
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> root dir: hdfs://localhost:8701/hbase
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> Initializing region server.
> [14/01/11 10:45:21] 7631   [           main] INFO  ion.service.HBaseRegionService  - troove> Starting region server thread.
> [14/01/11 10:46:54] 100764 [        HMaster] FATAL he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
> java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020]
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
> 	at $Proxy14.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
> 	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
> 	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3445) Master crashes on data that was moved from different host

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985464#action_12985464 ] 

Hudson commented on HBASE-3445:
-------------------------------

Integrated in HBase-TRUNK #1719 (See [https://hudson.apache.org/hudson/job/HBase-TRUNK/1719/])
    

> Master crashes on data that was moved from different host
> ---------------------------------------------------------
>
>                 Key: HBASE-3445
>                 URL: https://issues.apache.org/jira/browse/HBASE-3445
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: James Kennedy
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.90.1
>
>         Attachments: 3445-refactor.txt, 3445-v2.txt, 3445_0.90.0.patch
>
>
> While testing an upgrade to 0.90.0 RC3 I noticed that if I seeded our test data on one machine and transferred to another machine the HMaster on the new machine dies on startup.
> Based on the following stack trace it looks as though it is attempting to find the .meta region with the ip address of the original machine.  Instead of waiting around for RegionServer's to register with new location data, HMaster throws it's hands up with a FATAL exception.
> Note that deleting the zookeeper dir makes no difference.
> Also note that so far I have only reproduced this in my own environment using the hbase-trx extension of HBase and an ApplicationStarter that starts the Master and RegionServer together in the same JVM.  While the issue seems likely isolated from those factors it is far from a vanilla HBase environment.
> I will spend some time trying to reproduce the issue in a proper hbase test.  But perhaps someone can beat me to it?  How do I simulate the IP switch? May require a data.tar upload. 
> [14/01/11 10:45:20] 6396   [     Thread-298] ERROR server.quorum.QuorumPeerConfig  - Invalid configuration, only one server specified (ignoring)
> [14/01/11 10:45:21] 7178   [           main] INFO  ion.service.HBaseRegionService  - troove> region port:       60010
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> region interface:  org.apache.hadoop.hbase.ipc.IndexedRegionInterface
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> root dir: hdfs://localhost:8701/hbase
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> Initializing region server.
> [14/01/11 10:45:21] 7631   [           main] INFO  ion.service.HBaseRegionService  - troove> Starting region server thread.
> [14/01/11 10:46:54] 100764 [        HMaster] FATAL he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
> java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020]
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
> 	at $Proxy14.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
> 	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
> 	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-3445) Master crashes on data that was moved from different host

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3445:
-------------------------

    Attachment: 3445-v2.txt

Here is test that manufactures condition James sees.  His patch fixes it (I just added DEBUG logging to his patch).  I'm going to commit though I'm not going to include my test because of HBASE-3456 "Fix hardcoding of 20 second socket timeout down in HBaseClient 						
hbase-issues".  I don't want to add gratuitous 20 second wait to our test suite (not that anyone would notice the extra 20 seconds on top of an hour-plus suite).			 

> Master crashes on data that was moved from different host
> ---------------------------------------------------------
>
>                 Key: HBASE-3445
>                 URL: https://issues.apache.org/jira/browse/HBASE-3445
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: James Kennedy
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.90.1
>
>         Attachments: 3445-refactor.txt, 3445-v2.txt, 3445_0.90.0.patch
>
>
> While testing an upgrade to 0.90.0 RC3 I noticed that if I seeded our test data on one machine and transferred to another machine the HMaster on the new machine dies on startup.
> Based on the following stack trace it looks as though it is attempting to find the .meta region with the ip address of the original machine.  Instead of waiting around for RegionServer's to register with new location data, HMaster throws it's hands up with a FATAL exception.
> Note that deleting the zookeeper dir makes no difference.
> Also note that so far I have only reproduced this in my own environment using the hbase-trx extension of HBase and an ApplicationStarter that starts the Master and RegionServer together in the same JVM.  While the issue seems likely isolated from those factors it is far from a vanilla HBase environment.
> I will spend some time trying to reproduce the issue in a proper hbase test.  But perhaps someone can beat me to it?  How do I simulate the IP switch? May require a data.tar upload. 
> [14/01/11 10:45:20] 6396   [     Thread-298] ERROR server.quorum.QuorumPeerConfig  - Invalid configuration, only one server specified (ignoring)
> [14/01/11 10:45:21] 7178   [           main] INFO  ion.service.HBaseRegionService  - troove> region port:       60010
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> region interface:  org.apache.hadoop.hbase.ipc.IndexedRegionInterface
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> root dir: hdfs://localhost:8701/hbase
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> Initializing region server.
> [14/01/11 10:45:21] 7631   [           main] INFO  ion.service.HBaseRegionService  - troove> Starting region server thread.
> [14/01/11 10:46:54] 100764 [        HMaster] FATAL he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
> java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020]
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
> 	at $Proxy14.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
> 	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
> 	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-3445) Master crashes on data that was moved from different host

Posted by "James Kennedy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Kennedy updated HBASE-3445:
---------------------------------

    Summary: Master crashes on data that was moved from different host  (was: Master crashes on when data moved to different host)

> Master crashes on data that was moved from different host
> ---------------------------------------------------------
>
>                 Key: HBASE-3445
>                 URL: https://issues.apache.org/jira/browse/HBASE-3445
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: James Kennedy
>            Priority: Critical
>             Fix For: 0.90.0
>
>         Attachments: 3445_0.90.0.patch
>
>
> While testing an upgrade to 0.90.0 RC3 I noticed that if I seeded our test data on one machine and transferred to another machine the HMaster on the new machine dies on startup.
> Based on the following stack trace it looks as though it is attempting to find the .meta region with the ip address of the original machine.  Instead of waiting around for RegionServer's to register with new location data, HMaster throws it's hands up with a FATAL exception.
> Note that deleting the zookeeper dir makes no difference.
> Also note that so far I have only reproduced this in my own environment using the hbase-trx extension of HBase and an ApplicationStarter that starts the Master and RegionServer together in the same JVM.  While the issue seems likely isolated from those factors it is far from a vanilla HBase environment.
> I will spend some time trying to reproduce the issue in a proper hbase test.  But perhaps someone can beat me to it?  How do I simulate the IP switch? May require a data.tar upload. 
> [14/01/11 10:45:20] 6396   [     Thread-298] ERROR server.quorum.QuorumPeerConfig  - Invalid configuration, only one server specified (ignoring)
> [14/01/11 10:45:21] 7178   [           main] INFO  ion.service.HBaseRegionService  - troove> region port:       60010
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> region interface:  org.apache.hadoop.hbase.ipc.IndexedRegionInterface
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> root dir: hdfs://localhost:8701/hbase
> [14/01/11 10:45:21] 7180   [           main] INFO  ion.service.HBaseRegionService  - troove> Initializing region server.
> [14/01/11 10:45:21] 7631   [           main] INFO  ion.service.HBaseRegionService  - troove> Starting region server thread.
> [14/01/11 10:46:54] 100764 [        HMaster] FATAL he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
> java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020]
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
> 	at $Proxy14.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
> 	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
> 	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.