You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/09/09 15:31:00 UTC

[jira] [Commented] (IMPALA-7961) Concurrent catalog heavy workloads can cause queries with SYNC_DDL to fail fast

    [ https://issues.apache.org/jira/browse/IMPALA-7961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192952#comment-17192952 ] 

ASF subversion and git services commented on IMPALA-7961:
---------------------------------------------------------

Commit 0c89a9d562c280507a6e842898bf3e41cadc3ff1 in impala's branch refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0c89a9d ]

IMPALA-10140: Fix CatalogExeception for creating database with sync_ddl as true

IMPALA-7961 handle the cases for query "create table if not exists"
with sync_ddl as true. Customers reported similar issue which happened
for query "create database if not exists" with sync_ddl as true.
This patch adds the similar fixing as the fixing for IMPALA-7961 to
function CatalogOpExecutor.createDatabase() to fix the issue.

Testing:
 - Manual tests
   Since this is a racy bug, I could only reproduce it by forcing
   frequent topicUpdateLog GCs along with a specific sequence of
   actions, like: run some DDLs and REFRESHs to trigger a GC in
   topicUpdateLog, then run query "create database if not exists" with
   sync_ddl as true. Verified that the issue couldn't be reproduced
   after applying this patch.
 - Passed exhaustive test.

Change-Id: Id623118f8938f416414c45d93404fb70d036a9df
Reviewed-on: http://gerrit.cloudera.org:8080/16421
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Concurrent catalog heavy workloads can cause queries with SYNC_DDL to fail fast
> -------------------------------------------------------------------------------
>
>                 Key: IMPALA-7961
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7961
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 2.12.0, Impala 3.1.0
>            Reporter: Bharath Vissapragada
>            Assignee: Bharath Vissapragada
>            Priority: Critical
>             Fix For: Impala 3.2.0
>
>         Attachments: 0001-Repro-of-IMPALA-7961.patch
>
>
> When catalog server is under heavy load with concurrent updates to objects, queries with SYNC_DDL can fail with the following message.
> *User facing error message:*
> {noformat}
> ERROR: CatalogException: Couldn't retrieve the catalog topic version for the SYNC_DDL operation after 3 attempts.The operation has been successfully executed but its effects may have not been broadcast to all the coordinators.
> {noformat}
> *Exception from the catalog server log:*
> {noformat}
> I1031 00:00:49.168761 1127039 CatalogServiceCatalog.java:1903] Operation using SYNC_DDL is waiting for catalog topic version: 236535. Time to identify topic version (msec): 1088
> I1031 00:00:49.168824 1125528 CatalogServiceCatalog.java:1903] Operation using SYNC_DDL is waiting for catalog topic version: 236535. Time to identify topic version (msec): 12625
> I1031 00:00:49.168851 1131986 jni-util.cc:230] org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog topic version for the SYNC_DDL operation after 3 attempts.The operation has been successfully executed but its effects may have not been broadcast to all the coordinators.
>         at org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:1891)
>         at org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:336)
>         at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:146)
> ::::
> {noformat}
> *What this means*
> The Catalog operation is actually successful (the change has been committed to HMS and Catalog server cache) but the Catalog server noticed that it is taking longer than expected time for it to broadcast the changes (for whatever reason) and instead of hanging in there, it fails fast. The coordinators are expected to eventually sync up in the background.
> *Problem*
>  - This violates the contract of the SYNC_DDL query option since the query returns early.
>  - This is a behavioral regression from pre IMPALA-5058 state where the queries would wait forever for SYNC_DDL based changes to propagate.
> *Notes*
>  - Introduced by IMPALA-5058
>  - Based on the occurrences of this issue, we narrowed it down to a specific kind of DDLs (see Jira comments).
>  - My understanding is that this also applies to the Catalog V2 (or LocalCatalog mode) since we still rely on the CatalogServer for DDL orchestration and hence it takes this codepath.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org