You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "liuyao (Jira)" <ji...@apache.org> on 2021/07/03 09:18:00 UTC

[jira] [Assigned] (IMPALA-5476) Catalogd restart bring about metadata is out of sync

     [ https://issues.apache.org/jira/browse/IMPALA-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

liuyao reassigned IMPALA-5476:
------------------------------

    Assignee: liuyao

> Catalogd restart bring about metadata is out of sync 
> -----------------------------------------------------
>
>                 Key: IMPALA-5476
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5476
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog, Frontend
>    Affects Versions: Impala 2.8.0
>            Reporter: fengYu
>            Assignee: liuyao
>            Priority: Major
>              Labels: consistency, fault-tolerance
>
> I having been using impala in our env. Here is our cluster deployment:
> 20+ impalad backend.
> 4 of all impalads act as coordinator.
> one catalogd and one statestored
> I encounter one problem that one impalad's metadata is out of sync after catalogd restart.I find that while catalogd restarting, a DML operation is executing.
> After I analyze impala source code, I reappear the problem. this is my steps and analysis:
> 1. Start the impala cluster.
> 2. The cluster run a long time, lots of metadata operations, and current catalogVersion_ is big(such as bigger than 10000)
> 3. Submit a DML query(such as 'insert into xx partition() select xxx') to one impalad, and the query run about 1m.
> 4. While the query running, I stop catalogd, and I start catalogd just before the query execute QueryExecState->UpdateCatalog().
> 5. UpdateCatalog() will request catalogd for UpdateCatalog and catalogd will update the metadata of the table and response the newest metadata of the table.
> 6. After catalogd response, UpdateCatalog() update metadata cached in impalad(call updateCatalogCache()), and the run the following code:
>      if (!catalogServiceId_.equals(req.getCatalog_service_id())) {
>       boolean firstRun = catalogServiceId_.equals(INITIAL_CATALOG_SERVICE_ID);
>       catalogServiceId_ = req.getCatalog_service_id();
>       if (!firstRun) {
>         // Throw an exception which will trigger a full topic update request.
>         throw new CatalogException("Detected catalog service ID change. Aborting " +
>             "updateCatalog()");
>       }
>     }
> serviceId is the new started catalogd's serviceId and do not equals to the impalad's catalogServiceId_, so the function throw CatalogException and the query get EXCEPTION, what is more, the impalad's catalogServiceId_ is set to the new one.
> 7. After catalogd start successfully, and publish all metadata to statestored, then push to the impalad, After step 6, impalad's catalogServiceId_ equals to the catalogd's serviceId, no exception throws.
> 8. In normal steps, step 7 will throw the CatalogException and set the from_version to 0 and statestored send full metadatas to impalad in next UpdateState().
> 9. After all steps finish, the impalad is out of sync, all new metadata operation will be lost because CatalogObjectCache.add() need 'new item will only be added if it has a larger catalog version'.
> I said "new metadata will be lost"  means the following metadata operation which happened to the existing table will be lost until the table's version catch up with the older version. I think any operation can not recover it because impalad update local cached metadata by comparing new version and older version. 
> I try some operations which can trigger table's metadata reloading and new version generated such as refresh, alter table. But new metadata always lost until the catalog_version bigger than the older one, for new created catalog object(such as create table/ create database..), the metadata is up-to-date. 
> I think it is a bug, we need keep older catalogServiceId_ until full newly metadata applied(non-delta one, pushed by statestored), even all of metadata operations will be EXCEPTION at this time gap. perhaps there are some better solutions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org