You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2023/06/30 05:38:00 UTC
[jira] [Updated] (IMPALA-12256) Stale DROP_PARTITION events might not be skipped correctly

     [ https://issues.apache.org/jira/browse/IMPALA-12256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Quanlong Huang updated IMPALA-12256:
------------------------------------
    Fix Version/s:     (was: Impala 4.1.0)
                       (was: Impala 4.2.0)
                       (was: Impala 4.1.1)
                       (was: Impala 4.1.2)

> Stale DROP_PARTITION events might not be skipped correctly
> ----------------------------------------------------------
>
>                 Key: IMPALA-12256
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12256
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, Impala 4.1.2
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> Since IMPALA-10502, we track the create event id for db/table/partitions when they are created. It's used to skip stale DROP events, i.e. events that are generated earlier than the object is created.
> However, in some DDLs like COMPUTE INCREMENTAL STATS, we lost the create event id when reloading partitions. This results in stale DROP_PARTITION events not be skipped correctly.
> This can be reproduced with a higher value of "hms_event_polling_interval_s" so the DROP_PARTITION event can come later than the COMPUTE INCREMENTAL STATS finishes.
> {code:bash}
> bin/start-impala-cluster.py --catalogd_args=--hms_event_polling_interval_s=10 {code}
> Create a non-transactional partitioned table with one partition:
> {code:sql}
> create table my_part (id int) partitioned by (p int) stored as textfile;
> insert into my_part partition(p=0) values (0);{code}
> Put the below commands in a file and run them at once:
> {code:sql}
> alter table my_part drop if exists partition (p=0);
> insert into my_part partition(p=0) values (0),(1),(2),(3);
> compute incremental stats my_part partition(p=0);
> {code}
> In the catalogd logs, we can see the partition being dropped by the DROP_PARTITION event:
> {code:java}
> I0630 13:27:11.840737 17106 CatalogOpExecutor.java:4484] EventId: 8316831 Skipping removal of 0/1 partitions since they don't exist or were created later in table default.my_part.
> I0630 13:27:11.841095 17106 MetastoreEvents.java:628] EventId: 8316831 EventType: DROP_PARTITION 1 partitions dropped from table default.my_part
> {code}
> This event should be skipped since the partition is recreated after it. Although there is a follow-up ADD_PARTITION event (generated by the recreation statement) that will add back the partition, there is a period between them that the metadata is incorrect (missing the actually existing partition).
> The cause is we lost the create_event_id of the recreated partition when reloading it for the COMPUTE INCREMENTAL STATS. There are other DDLs that could cause the same issue, e.g. ALTER TABLE DROP STATS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org