You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2020/01/03 08:50:00 UTC

[jira] [Commented] (IMPALA-9135) DDLs with sync_ddl may fail with concurrent INVALIDATE METADATA

    [ https://issues.apache.org/jira/browse/IMPALA-9135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007312#comment-17007312 ] 

Quanlong Huang commented on IMPALA-9135:
----------------------------------------

Run test_concurrent_ddls.py for 3373 times and hit a timeout failure for DML. It's a similar error as REFRESH:

In Catalogd, the timeline of the two concurrent threads:
{code:java}
Main Thread(updateCatalog for DML):
 - Get Table object by CatalogOpExecutor#getExistingTable() for tableA.
 - Use CatalogServiceCatalog#tryLockTable to lock it before any modification. Waiting for versionLock_.writeLock()

InvalidateMetadata Thread:
 - Holding versionLock_.writeLock()
 - Replace the entir dbCache_. Now tableA has catalog version n1.
 - Release versionLock_.writeLock()

Main Thread:
 - Succeed in CatalogServiceCatalog#tryLockTable (though the Table object is now stale).
 - Get a new catalog version n2 (>n1) and release versionLock_.writeLock().
 - Modify the Table object and bump its catalog version to n2.
 - Waiting for version >=n2 to be sent. The current sent version is n1.
{code}
The stale tableA object with catalog version n2 won't be sent since it can't be found in dbCache_ or topicUpdateLog_. In dbCache_, the catalog version of tableA is n1.
 If no more updates for the catalog, Main Thread will hang forever.
 If no more updates for tableA, Main Tread will run out of waiting attemps in CatalogServiceCatalog#waitForSyncDdlVersion() and then fail.

The root cause is the lack of protection between CatalogOpExecutor#getExistingTable() and CatalogServiceCatalog#tryLockTable(). A concurrent INVALIDATE METADATA can make the table object stale so we're actually locking a stale table object and doing modification on it. Since this pattern (getExistingTable + tryLockTable) is used in many places, we may need a careful refactor to fix it. One solution is retry when the table object become stale.

> DDLs with sync_ddl may fail with concurrent INVALIDATE METADATA
> ---------------------------------------------------------------
>
>                 Key: IMPALA-9135
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9135
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>
> This can be revealed by tests/custom_cluster/test_concurrent_ddls.py added in [https://gerrit.cloudera.org/c/14307]
> If running with INVALIDATE METADATA concurrently, the DDLs may run out of attemps in CatalogServiceCatalog.waitForSyncDdlVersion() to wait for the target update being sent, no matter how large we increase the maxNumAttempts.
> The error logs:
> {code:java}
> E1107 17:34:25.092439  7353 CatalogServiceCatalog.java:2626] Couldn't retrieve the covering topic version for catalog objects. Updated objects: [TABLE:test_ddls_with_invalidate_metadata_sync_ddl_f41e97e6.test_9_part version: 349], deleted objects: []
> I1107 17:34:25.093451  7353 jni-util.cc:288] org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog topic version for the SYNC_DDL operation after 5 attempts.The operation has been successfully executed but its effects may have not been broadcast to all the coordinators.
>         at org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:2630)
>         at org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:414)
>         at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:167)
> I1107 17:34:25.142006  6389 catalog-server.cc:337] A catalog update with 2 entries is assembled. Catalog version: 356 Last sent catalog version: 355
> I1107 17:34:25.142168  6381 catalog-server.cc:641] Collected update: 1:TABLE:test_ddls_with_invalidate_metadata_sync_ddl_f41e97e6.test_15_part, version=357, original size=101, compressed size=98
> I1107 17:34:25.142215  6381 catalog-server.cc:641] Collected update: 1:CATALOG_SERVICE_ID, version=357, original size=49, compressed size=52
> I1107 17:34:25.142287  7356 CatalogServiceCatalog.java:2642] Operation using SYNC_DDL is waiting for catalog topic version: 357. Time to identify topic version (msec): 19
> I1107 17:34:25.192239  6389 catalog-server.cc:337] A catalog update with 2 entries is assembled. Catalog version: 357 Last sent catalog version: 356
> I1107 17:34:25.192428  6381 catalog-server.cc:641] Collected update: 1:TABLE:test_ddls_with_invalidate_metadata_sync_ddl_f41e97e6.test_16_part, version=358, original size=101, compressed size=98
> I1107 17:34:25.192462  6381 catalog-server.cc:641] Collected update: 1:TABLE:test_ddls_with_invalidate_metadata_sync_ddl_f41e97e6.test_11_part, version=359, original size=101, compressed size=98
> I1107 17:34:25.192484  6381 catalog-server.cc:641] Collected update: 1:TABLE:test_ddls_with_invalidate_metadata_sync_ddl_f41e97e6.test_12_part, version=360, original size=101, compressed size=98
> I1107 17:34:25.192535  6381 catalog-server.cc:641] Collected update: 1:CATALOG_SERVICE_ID, version=360, original size=49, compressed size=52
> I1107 17:34:25.192613  7355 CatalogServiceCatalog.java:2642] Operation using SYNC_DDL is waiting for catalog topic version: 360. Time to identify topic version (msec): 13
> I1107 17:34:25.192695  7351 CatalogServiceCatalog.java:2642] Operation using SYNC_DDL is waiting for catalog topic version: 360. Time to identify topic version (msec): 45
> I1107 17:34:25.192734  7350 CatalogServiceCatalog.java:2642] Operation using SYNC_DDL is waiting for catalog topic version: 360. Time to identify topic version (msec): 29
> I1107 17:34:25.222911  7353 status.cc:126] CatalogException: Couldn't retrieve the catalog topic version for the SYNC_DDL operation after 5 attempts.The operation has been successfully executed but its effects may have not been broadcast to all the coordinators.
>     @          0x1c5ae50  impala::Status::Status()
>     @          0x24f7ad2  impala::JniUtil::GetJniExceptionMsg()
>     @          0x1c41987  impala::JniCall::Call<>()
>     @          0x1c3fec9  impala::JniUtil::CallJniMethod<>()
>     @          0x1c3e0e6  impala::Catalog::ExecDdl()
>     @          0x1c1ed17  CatalogServiceThriftIf::ExecDdl()
>     @          0x1cb3047  impala::CatalogServiceProcessor::process_ExecDdl()
>     @          0x1cb2d95  impala::CatalogServiceProcessor::dispatchCall()
>     @          0x1c08d65  apache::thrift::TDispatchProcessor::process()
>     @          0x20e8a0d  apache::thrift::server::TAcceptQueueServer::Task::run()
>     @          0x20de040  impala::ThriftThread::RunRunnable()
>     @          0x20df766  boost::_mfi::mf2<>::operator()()
>     @          0x20df5fc  boost::_bi::list3<>::operator()<>()
>     @          0x20df348  boost::_bi::bind_t<>::operator()()
>     @          0x20df25b  boost::detail::function::void_function_obj_invoker0<>::invoke()
>     @          0x1ffb6e9  boost::function0<>::operator()()
>     @          0x2573dea  impala::Thread::SuperviseThread()
>     @          0x257c16e  boost::_bi::list5<>::operator()<>()
>     @          0x257c092  boost::_bi::bind_t<>::operator()()
>     @          0x257c055  boost::detail::thread_data<>::run()
>     @          0x3d61599  thread_proxy
>     @     0x7f1ce6ca46b9  start_thread
>     @     0x7f1ce343f41c  clone
> E1107 17:34:25.222932  7353 catalog-server.cc:112] CatalogException: Couldn't retrieve the catalog topic version for the SYNC_DDL operation after 5 attempts.The operation has been successfully executed but its effects may have not been broadcast to all the coordinators.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org