You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Alexander Behm (JIRA)" <ji...@apache.org> on 2017/09/18 19:53:02 UTC
[jira] [Created] (IMPALA-5952) Query waiting indefinitely for table
metadata to arrive
Alexander Behm created IMPALA-5952:
--------------------------------------
Summary: Query waiting indefinitely for table metadata to arrive
Key: IMPALA-5952
URL: https://issues.apache.org/jira/browse/IMPALA-5952
Project: IMPALA
Issue Type: Bug
Components: Catalog, Frontend
Affects Versions: Impala 2.8.0, Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
Reporter: Alexander Behm
Assignee: Alexander Behm
Priority: Critical
Impala queries may hang indefinitely while waiting for the metadata of a deleted table to arrive through a statestore topic update. You will see many messages like this in the log of the impalad coordinating the hung query:
{code}
Missing tables were not received in 120000ms. Load request will be retried. <list of tables>
{code}
If one of the tables mentioned in those log messates has been deleted, then you may be hitting this issue.
This code in Frontend#getMissingTbls() clearly shows the bug:
{code}
private Set<TableName> getMissingTbls(Set<TableName> tableNames) {
Set<TableName> missingTbls = new HashSet<TableName>();
for (TableName tblName: tableNames) {
Db db = getCatalog().getDb(tblName.getDb());
if (db == null) continue; <--- wrong! database has been dropped and may never arrive
Table tbl = db.getTable(tblName.getTbl());
if (tbl == null) continue; <--- wrong! table has been dropped and may never arrive
if (!tbl.isLoaded()) missingTbls.add(tblName);
}
return missingTbls;
}
{code}
Getting into this hung state requires an elaborate series of events, for example:
* impalad A requests table T to be loaded and gets into the wait loop
* impalad B issues a "DROP TABLE T"
* catalogd loads the metadata for table T
* statestored requests topic update from catalogd; update includes T
* statestored sends update to impalad B
* impalad B completes the "DROP TABLE T" operation
* statestored requests topic update from catalogd; update includes deletion of T
* statestored sends update to impalad A which includes the deletion of table T
* impalad A is still in the wait loop; the metadata for T will never arrive because T has been dropped
Notice how impalad A may "skip" the first update for T which includes its metadata. This typically only happens on very busy clusters where the statestore has trouble sending all catalog snapshots to all subscribers (i.e. some subscribers skip some snapshots).
*Workaround*
* Re-create tables with the same name as the deleted ones (schema and format do not matter, only the dabatase and table name must match)
* Might need to run "invalidate metadata <table>" on them
* Once the hung queries finished (failed or succeeded), the re-created tables can be dropped again
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)