You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Vihang Karajgaonkar (Jira)" <ji...@apache.org> on 2019/11/11 07:19:00 UTC

[jira] [Resolved] (IMPALA-9139) Invalidate metadata adds all the tables to background loading pool unnecessarily

     [ https://issues.apache.org/jira/browse/IMPALA-9139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vihang Karajgaonkar resolved IMPALA-9139.
-----------------------------------------
    Resolution: Not A Problem

> Invalidate metadata adds all the tables to background loading pool unnecessarily
> --------------------------------------------------------------------------------
>
>                 Key: IMPALA-9139
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9139
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Vihang Karajgaonkar
>            Priority: Major
>
> I see the following code in the reset() method of CatalogServiceCatalog
> {code:java}
>       // Build a new DB cache, populate it, and replace the existing cache in one
>       // step.
>       Map<String, Db> newDbCache = new ConcurrentHashMap<String, Db>();
>       List<TTableName> tblsToBackgroundLoad = new ArrayList<>();
>       try (MetaStoreClient msClient = getMetaStoreClient()) {
>         List<String> allDbs = msClient.getHiveClient().getAllDatabases();
>         int numComplete = 0;
>         for (String dbName: allDbs) {
>           if (isBlacklistedDb(dbName)) {
>             LOG.info("skip blacklisted db: " + dbName);
>             continue;
>           }
>           String annotation = String.format("invalidating metadata - %s/%s dbs complete",
>               numComplete++, allDbs.size());
>           try (ThreadNameAnnotator tna = new ThreadNameAnnotator(annotation)) {
>             dbName = dbName.toLowerCase();
>             Db oldDb = oldDbCache.get(dbName);
>             Pair<Db, List<TTableName>> invalidatedDb = invalidateDb(msClient,
>                 dbName, oldDb);
>             if (invalidatedDb == null) continue;
>             newDbCache.put(dbName, invalidatedDb.first);
>             tblsToBackgroundLoad.addAll(invalidatedDb.second);
>           }
>         }
>       }
>       dbCache_.set(newDbCache);
>       // Identify any deleted databases and add them to the delta log.
>       Set<String> oldDbNames = oldDbCache.keySet();
>       Set<String> newDbNames = newDbCache.keySet();
>       oldDbNames.removeAll(newDbNames);
>       for (String dbName: oldDbNames) {
>         Db removedDb = oldDbCache.get(dbName);
>         updateDeleteLog(removedDb);
>       }
>       // Submit tables for background loading.
>       for (TTableName tblName: tblsToBackgroundLoad) {
>         tableLoadingMgr_.backgroundLoad(tblName);
>       }
> {code}
> If you notice above, the tables are being added to the backgroundLoad with checking the flag {{loadInBackground_}}. This means that even if the flag is unset, after we issue a invalidate metadata command, all the tables in the system are being loaded in the background. Note that this code is only loading the tables, not adding the loaded tables to the catalog which is good otherwise the memory footprint of catalog would be increased after every invalidate metadata command.
> This bug has 2 implications:
> 1. We are obviously wasting a lot of cpu cycles without getting anything out of it.
> 2. The more subtle side-effect is that this would fill up the {{tableLoadingDeque_}}. This means any other background load task will take a longer duration to complete.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)