You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Michael Stack (Jira)" <ji...@apache.org> on 2019/12/20 22:34:00 UTC

[jira] [Resolved] (HBASE-9779) IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify table

     [ https://issues.apache.org/jira/browse/HBASE-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Stack resolved HBASE-9779.
----------------------------------
    Resolution: Won't Fix

Stale. Context is different now.

> IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify table 
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-9779
>                 URL: https://issues.apache.org/jira/browse/HBASE-9779
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.96.0
>            Reporter: Michael Stack
>            Assignee: Michael Stack
>            Priority: Critical
>         Attachments: 9779part.txt
>
>
> As part of the test, we want to delete the created table to restore cluster state.  Interestingly we can disable the table successfully but then immediately after we fail the delete because we cannot get the table descriptor -- getting the file descriptor is used to test if table is present.
> The test for getDescriptor is kinda broke because it throws base IOE which causes clients to retry over and over again as though the descriptor was going to come back.
> This bug is kinda ugly because in at least one case it caused our long-running hbase-it suite run to fail so would be good to fix.
> Here is sample from a test run:
> {code}
> Disabling table IntegrationTestLoadAndVerify 2013-10-11 18:27:53,485 INFO  [main] client.HBaseAdmin: Started disable of IntegrationTestLoadAndVerify
> 2013-10-11 18:27:53,526 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=90000 watcher=catalogtracker-on-hconnection-0x5a7e666f
> 2013-10-11 18:27:53,527 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper ensemble=a1805.halxg.cloudera.com:2181
> 2013-10-11 18:27:53,527 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Opening socket connection to server a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate using SASL (unknown error)
> 2013-10-11 18:27:53,527 DEBUG [main] catalog.CatalogTracker: Starting catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5
> 2013-10-11 18:27:53,529 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, initiating session
> 2013-10-11 18:27:53,539 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Session establishment complete on server a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c70, negotiated timeout = 40000
> 2013-10-11 18:27:53,602 DEBUG [main] catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5
> 2013-10-11 18:27:53,662 INFO  [main] zookeeper.ZooKeeper: Session: 0x1412d47f53a5c70 closed
> 2013-10-11 18:27:53,662 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
> .2013-10-11 18:27:54,666 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=90000 watcher=catalogtracker-on-hconnection-0x5a7e666f
> 2013-10-11 18:27:54,667 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper ensemble=a1805.halxg.cloudera.com:2181
> 2013-10-11 18:27:54,667 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Opening socket connection to server a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate using SASL (unknown error)
> 2013-10-11 18:27:54,667 DEBUG [main] catalog.CatalogTracker: Starting catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d
> 2013-10-11 18:27:54,667 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, initiating session
> 2013-10-11 18:27:54,696 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Session establishment complete on server a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c71, negotiated timeout = 40000
> 2013-10-11 18:27:54,821 DEBUG [main] catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d
> 2013-10-11 18:27:54,871 INFO  [main] zookeeper.ZooKeeper: Session: 0x1412d47f53a5c71 closed
> 2013-10-11 18:27:54,871 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
> .2013-10-11 18:27:55,890 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=90000 watcher=catalogtracker-on-hconnection-0x5a7e666f
> 2013-10-11 18:27:55,891 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper ensemble=a1805.halxg.cloudera.com:2181
> 2013-10-11 18:27:55,891 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Opening socket connection to server a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate using SASL (unknown error)
> 2013-10-11 18:27:55,891 DEBUG [main] catalog.CatalogTracker: Starting catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@74234057
> 2013-10-11 18:27:55,892 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, initiating session
> 2013-10-11 18:27:55,896 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Session establishment complete on server a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c72, negotiated timeout = 40000
> 2013-10-11 18:27:55,908 DEBUG [main] catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@74234057
> 2013-10-11 18:27:55,912 INFO  [main] zookeeper.ZooKeeper: Session: 0x1412d47f53a5c72 closed
> 2013-10-11 18:27:55,912 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
>  2389 ms
> Deleting table IntegrationTestLoadAndVerify 
> 2013-10-11 18:33:02,029 INFO  [main] hbase.HBaseCluster: Restoring cluster - started
> 2013-10-11 18:33:02,032 INFO  [main] hbase.HBaseCluster: Added new HBaseAdmin
> 2013-10-11 18:33:02,032 INFO  [main] hbase.HBaseCluster: Restoring cluster - done
> 2013-10-11 18:33:02,033 ERROR [main] util.AbstractHBaseTool: Error running command-line tool
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions:
> Fri Oct 11 18:27:55 PDT 2013, org.apache.hadoop.hbase.client.RpcRetryingCaller@48c6f1e8, org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): java.io.IOException: HTableDescriptor missing for IntegrationTestLoadAndVerify
> 	at org.apache.hadoop.hbase.master.handler.TableEventHandler.getTableDescriptor(TableEventHandler.java:231)
> 	at org.apache.hadoop.hbase.master.handler.DeleteTableHandler.prepareWithTableLock(DeleteTableHandler.java:58)
> 	at org.apache.hadoop.hbase.master.handler.TableEventHandler.prepare(TableEventHandler.java:93)
> 	at org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:1816)
> 	at org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:1826)
> 	at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38213)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> 	at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> ....
> {code}
> Looking on master logs, it looks fine:
> {code}
>   1 2013-10-11 18:27:55,541 DEBUG [MASTER_TABLE_OPERATIONS-a1805:60000-0] handler.DisableTableHandler: Disable waiting until done; 298996 ms remaining; []
>   0 2013-10-11 18:27:55,546 INFO  [MASTER_TABLE_OPERATIONS-a1805:60000-0] handler.DisableTableHandler: Disabled table is done=true
>   1 2013-10-11 18:27:55,555 DEBUG [MASTER_TABLE_OPERATIONS-a1805:60000-0] lock.ZKInterProcessLockBase: Released /hbase/table-lock/IntegrationTestLoadAndVerify/write-master:600000000000001
>   2 2013-10-11 18:27:55,923 INFO  [RpcServer.handler=14,port=60000] master.HMaster: Client=hbase//10.20.200.112 delete IntegrationTestLoadAndVerify
>   3 2013-10-11 18:27:55,931 DEBUG [RpcServer.handler=14,port=60000] lock.ZKInterProcessLockBase: Acquired a lock for /hbase/table-lock/IntegrationTestLoadAndVerify/write-master:600000000000002
>   4 2013-10-11 18:27:55,954 DEBUG [RpcServer.handler=14,port=60000] lock.ZKInterProcessLockBase: Released /hbase/table-lock/IntegrationTestLoadAndVerify/write-master:600000000000002
>   5 2013-10-11 18:27:56,167 INFO  [RpcServer.handler=15,port=60000] master.HMaster: Client=hbase//10.20.200.112 delete IntegrationTestLoadAndVerify
> ...
> {code}
> Oddly no exceptions other than corrupted snapshot because mssing hfile.  All seems to have been operating fine.  Will look more.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)