You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Alexander Behm (JIRA)" <ji...@apache.org> on 2017/09/18 19:53:02 UTC
[jira] [Created] (IMPALA-5952) Query waiting indefinitely for table metadata to arrive

Alexander Behm created IMPALA-5952:
--------------------------------------

             Summary: Query waiting indefinitely for table metadata to arrive
                 Key: IMPALA-5952
                 URL: https://issues.apache.org/jira/browse/IMPALA-5952
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog, Frontend
    Affects Versions: Impala 2.8.0, Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
            Reporter: Alexander Behm
            Assignee: Alexander Behm
            Priority: Critical


Impala queries may hang indefinitely while waiting for the metadata of a deleted table to arrive through a statestore topic update. You will see many messages like this in the log of the impalad coordinating the hung query:
{code}
Missing tables were not received in 120000ms. Load request will be retried. <list of tables>
{code}
If one of the tables mentioned in those log messates has been deleted, then you may be hitting this issue.

This code in Frontend#getMissingTbls() clearly shows the bug:
{code}
  private Set<TableName> getMissingTbls(Set<TableName> tableNames) {
    Set<TableName> missingTbls = new HashSet<TableName>();
    for (TableName tblName: tableNames) {
      Db db = getCatalog().getDb(tblName.getDb());
      if (db == null) continue; <--- wrong! database has been dropped and may never arrive
      Table tbl = db.getTable(tblName.getTbl());
      if (tbl == null) continue; <--- wrong! table has been dropped and may never arrive
      if (!tbl.isLoaded()) missingTbls.add(tblName);
    }
    return missingTbls;
  }
{code}

Getting into this hung state requires an elaborate series of events, for example:
* impalad A requests table T to be loaded and gets into the wait loop
* impalad B issues a "DROP TABLE T"
* catalogd loads the metadata for table T
* statestored requests topic update from catalogd; update includes T
* statestored sends update to impalad B
* impalad B completes the "DROP TABLE T" operation
* statestored requests topic update from catalogd; update includes deletion of T
* statestored sends update to impalad A which includes the deletion of table T
* impalad A is still in the wait loop; the metadata for T will never arrive because T has been dropped

Notice how impalad A may "skip" the first update for T which includes its metadata. This typically only happens on very busy clusters where the statestore has trouble sending all catalog snapshots to all subscribers (i.e. some subscribers skip some snapshots).

*Workaround*
* Re-create tables with the same name as the deleted ones (schema and format do not matter, only the dabatase and table name must match)
* Might need to run "invalidate metadata <table>" on them
* Once the hung queries finished (failed or succeeded), the re-created tables can be dropped again




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)