You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Alexander Behm (JIRA)" <ji...@apache.org> on 2017/09/20 21:15:03 UTC

[jira] [Resolved] (IMPALA-5952) Query waiting indefinitely for table metadata to arrive

     [ https://issues.apache.org/jira/browse/IMPALA-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexander Behm resolved IMPALA-5952.
------------------------------------
    Resolution: Not A Bug

Looking at the code more carefully, this is not a bug after all. getMissingTbls() will not return the dropped table and so query analysis will proceed and should report "Table not found"

> Query waiting indefinitely for table metadata to arrive
> -------------------------------------------------------
>
>                 Key: IMPALA-5952
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5952
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog, Frontend
>    Affects Versions: Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
>            Reporter: Alexander Behm
>            Assignee: Alexander Behm
>            Priority: Critical
>              Labels: hang
>
> Impala queries may hang indefinitely while waiting for the metadata of a deleted table to arrive through a statestore topic update. You will see many messages like this in the log of the impalad coordinating the hung query:
> {code}
> Missing tables were not received in 120000ms. Load request will be retried. <list of tables>
> {code}
> If one of the tables mentioned in those log messates has been deleted, then you may be hitting this issue.
> This code in Frontend#getMissingTbls() clearly shows the bug:
> {code}
>   private Set<TableName> getMissingTbls(Set<TableName> tableNames) {
>     Set<TableName> missingTbls = new HashSet<TableName>();
>     for (TableName tblName: tableNames) {
>       Db db = getCatalog().getDb(tblName.getDb());
>       if (db == null) continue; <--- wrong! database has been dropped and may never arrive
>       Table tbl = db.getTable(tblName.getTbl());
>       if (tbl == null) continue; <--- wrong! table has been dropped and may never arrive
>       if (!tbl.isLoaded()) missingTbls.add(tblName);
>     }
>     return missingTbls;
>   }
> {code}
> Getting into this hung state requires an elaborate series of events, for example:
> * impalad A requests table T to be loaded and gets into the wait loop
> * impalad B issues a "DROP TABLE T"
> * catalogd loads the metadata for table T
> * statestored requests topic update from catalogd; update includes T
> * statestored sends update to impalad B
> * impalad B completes the "DROP TABLE T" operation
> * statestored requests topic update from catalogd; update includes deletion of T
> * statestored sends update to impalad A which includes the deletion of table T
> * impalad A is still in the wait loop; the metadata for T will never arrive because T has been dropped
> Notice how impalad A may "skip" the first update for T which includes the metadata for T. This typically only happens on very busy clusters where the statestore has trouble sending all catalog snapshots to all subscribers in a timely fashion (i.e. some subscribers skip some snapshots).
> *Workaround*
> * Re-create tables with the same name as the deleted ones (schema and format do not matter, only the dabatase and table name must match)
> * Might need to run "invalidate metadata <table>" on them
> * Once the hung queries finished (failed or succeeded), the re-created tables can be dropped again



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)