You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@whirr.apache.org by "Tom White (JIRA)" <ji...@apache.org> on 2011/05/25 19:06:47 UTC
[jira] [Created] (WHIRR-314) HBase integration test can fail due to
Thrift server race
HBase integration test can fail due to Thrift server race
---------------------------------------------------------
Key: WHIRR-314
URL: https://issues.apache.org/jira/browse/WHIRR-314
Project: Whirr
Issue Type: Bug
Reporter: Tom White
Assignee: Tom White
There is a race condition where the Thrift server comes up faster than the master, fails to connect (after trying 10 times), then shuts down for good. Both Andrei and I have seen this fail on Rackspace Cloud Servers.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (WHIRR-314) HBase integration test can fail due to
Thrift server race
Posted by "Tom White (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/WHIRR-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tom White updated WHIRR-314:
----------------------------
Attachment: WHIRR-314.patch
Updated patch which addresses Andrei's comment. I'm going to commit this now.
> HBase integration test can fail due to Thrift server race
> ---------------------------------------------------------
>
> Key: WHIRR-314
> URL: https://issues.apache.org/jira/browse/WHIRR-314
> Project: Whirr
> Issue Type: Bug
> Reporter: Tom White
> Assignee: Tom White
> Attachments: WHIRR-314.patch, WHIRR-314.patch
>
>
> There is a race condition where the Thrift server comes up faster than the master, fails to connect (after trying 10 times), then shuts down for good. Both Andrei and I have seen this fail on Rackspace Cloud Servers.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (WHIRR-314) HBase integration test can fail due
to Thrift server race
Posted by "Tom White (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/WHIRR-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039212#comment-13039212 ]
Tom White commented on WHIRR-314:
---------------------------------
Here's a stack trace from the thrift server node:
{noformat}
2011-05-25 16:40:19,672 INFO org.apache.hadoop.hbase.client.HConnectionManager$TableServers: getMaster attempt 9 of 10 failed; no more retrying.
java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/master
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:481)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readMasterAddressOrThrow(ZooKeeperWrapper.java:377)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getMaster(HConnectionManager.java:381)
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:78)
at org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.<init>(ThriftServer.java:191)
at org.apache.hadoop.hbase.thrift.ThriftServer.doMain(ThriftServer.java:817)
at org.apache.hadoop.hbase.thrift.ThriftServer.main(ThriftServer.java:874)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:921)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:477)
... 6 more
2011-05-25 16:40:19,677 INFO org.apache.zookeeper.ZooKeeper: Session: 0x1302806aebc0001 closed
2011-05-25 16:40:19,678 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <173-203-217-78.static.cloud-ips.com:2181:/hbase,org.apache.hadoop.hbase.client.HConnectionManage
r>Closed connection with ZooKeeper; /hbase/root-region-server
{noformat}
> HBase integration test can fail due to Thrift server race
> ---------------------------------------------------------
>
> Key: WHIRR-314
> URL: https://issues.apache.org/jira/browse/WHIRR-314
> Project: Whirr
> Issue Type: Bug
> Reporter: Tom White
> Assignee: Tom White
> Attachments: WHIRR-314.patch
>
>
> There is a race condition where the Thrift server comes up faster than the master, fails to connect (after trying 10 times), then shuts down for good. Both Andrei and I have seen this fail on Rackspace Cloud Servers.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (WHIRR-314) HBase integration test can fail due
to Thrift server race
Posted by "Andrei Savu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/WHIRR-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039299#comment-13039299 ]
Andrei Savu commented on WHIRR-314:
-----------------------------------
+1 and we need the same change for CDH HBase in {{services/cdh/src/main/resources/functions/configure_cdh_hbase.sh}}.
Side note: later we should make sure that tests do not block forever and they fail after a reasonable amount of time (all the cleanup work is annoying).
> HBase integration test can fail due to Thrift server race
> ---------------------------------------------------------
>
> Key: WHIRR-314
> URL: https://issues.apache.org/jira/browse/WHIRR-314
> Project: Whirr
> Issue Type: Bug
> Reporter: Tom White
> Assignee: Tom White
> Attachments: WHIRR-314.patch
>
>
> There is a race condition where the Thrift server comes up faster than the master, fails to connect (after trying 10 times), then shuts down for good. Both Andrei and I have seen this fail on Rackspace Cloud Servers.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (WHIRR-314) HBase integration test can fail due to
Thrift server race
Posted by "Tom White (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/WHIRR-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tom White updated WHIRR-314:
----------------------------
Attachment: WHIRR-314.patch
This patch fixes the problem by increasing the number of retries to 100. I ran the integration test and it passed.
> HBase integration test can fail due to Thrift server race
> ---------------------------------------------------------
>
> Key: WHIRR-314
> URL: https://issues.apache.org/jira/browse/WHIRR-314
> Project: Whirr
> Issue Type: Bug
> Reporter: Tom White
> Assignee: Tom White
> Attachments: WHIRR-314.patch
>
>
> There is a race condition where the Thrift server comes up faster than the master, fails to connect (after trying 10 times), then shuts down for good. Both Andrei and I have seen this fail on Rackspace Cloud Servers.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (WHIRR-314) HBase integration test can fail due
to Thrift server race
Posted by "Tom White (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/WHIRR-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tom White resolved WHIRR-314.
-----------------------------
Resolution: Fixed
Fix Version/s: 0.5.0
I've just committed this.
> HBase integration test can fail due to Thrift server race
> ---------------------------------------------------------
>
> Key: WHIRR-314
> URL: https://issues.apache.org/jira/browse/WHIRR-314
> Project: Whirr
> Issue Type: Bug
> Reporter: Tom White
> Assignee: Tom White
> Fix For: 0.5.0
>
> Attachments: WHIRR-314.patch, WHIRR-314.patch
>
>
> There is a race condition where the Thrift server comes up faster than the master, fails to connect (after trying 10 times), then shuts down for good. Both Andrei and I have seen this fail on Rackspace Cloud Servers.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira