You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Feng Honghua (JIRA)" <ji...@apache.org> on 2014/02/28 08:48:19 UTC
[jira] [Created] (HBASE-10636) HBaseAdmin.deleteTable isn't
'really' synchronous in that still some cleanup in HMaster after client
thinks deleteTable() succeeds
Feng Honghua created HBASE-10636:
------------------------------------
Summary: HBaseAdmin.deleteTable isn't 'really' synchronous in that still some cleanup in HMaster after client thinks deleteTable() succeeds
Key: HBASE-10636
URL: https://issues.apache.org/jira/browse/HBASE-10636
Project: HBase
Issue Type: Sub-task
Components: Client, master
Reporter: Feng Honghua
Assignee: Feng Honghua
In HBaseAdmin.deleteTable():
{code}
public void deleteTable(final TableName tableName) throws IOException {
// Wait until all regions deleted
for (int tries = 0; tries < (this.numRetries * this.retryLongerMultiplier); tries++) {
// let us wait until hbase:meta table is updated and
// HMaster removes the table from its HTableDescriptors
if (values == null || values.length == 0) {
tableExists = false;
GetTableDescriptorsResponse htds;
MasterKeepAliveConnection master = connection.getKeepAliveMasterService();
try {
GetTableDescriptorsRequest req =
RequestConverter.buildGetTableDescriptorsRequest(tableName);
htds = master.getTableDescriptors(null, req);
} catch (ServiceException se) {
throw ProtobufUtil.getRemoteException(se);
} finally {
master.close();
}
tableExists = !htds.getTableSchemaList().isEmpty();
if (!tableExists) {
break;
}
}
}
{code}
client thinks deleteTable succeeds once it can't retrieve back the tableDescriptor
But in HMaster, the DeleteTableHandler which really deletes the table:
{code}
protected void handleTableOperation(List<HRegionInfo> regions)
throws IOException, KeeperException {
// 1. Wait because of region in transition
....
// 2. Remove regions from META
LOG.debug("Deleting regions from META");
MetaEditor.deleteRegions(this.server.getCatalogTracker(), regions);
// 3. Move the table in /hbase/.tmp
MasterFileSystem mfs = this.masterServices.getMasterFileSystem();
Path tempTableDir = mfs.moveTableToTemp(tableName);
try {
// 4. Delete regions from FS (temp directory)
FileSystem fs = mfs.getFileSystem();
for (HRegionInfo hri: regions) {
LOG.debug("Archiving region " + hri.getRegionNameAsString() + " from FS");
HFileArchiver.archiveRegion(fs, mfs.getRootDir(),
tempTableDir, new Path(tempTableDir, hri.getEncodedName()));
}
// 5. Delete table from FS (temp directory)
if (!fs.delete(tempTableDir, true)) {
LOG.error("Couldn't delete " + tempTableDir);
}
LOG.debug("Table '" + tableName + "' archived!");
} finally {
// 6. Update table descriptor cache
LOG.debug("Removing '" + tableName + "' descriptor.");
this.masterServices.getTableDescriptors().remove(tableName);
// 7. Clean up regions of the table in RegionStates.
LOG.debug("Removing '" + tableName + "' from region states.");
states.tableDeleted(tableName);
// 8. If entry for this table in zk, and up in AssignmentManager, remove it.
LOG.debug("Marking '" + tableName + "' as deleted.");
am.getZKTable().setDeletedTable(tableName);
}
if (cpHost != null) {
cpHost.postDeleteTableHandler(this.tableName);
}
}
{code}
Removing regions out of RegionStates, Marking table deleted from ZK, Calling coprocessor's postDeleteTableHandler are all after the table is removed from TableDescriptor cache
So client code relying on RegionStates/ZKTable/CP being cleaned up after deleteTable() possibly fail, if client requests hit HMaster before those three cleanup are done...
Actually when I add some sleep such as 200ms after below line to simulate a possible slow-running HMaster
{code}
this.masterServices.getTableDescriptors().remove(tableName);
{code}
Some unit tests(such as moveRegion / confirming postDeleteTable CP immediately after deleteTable) can't pass no longer
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)