You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Gabor Kaszab (Jira)" <ji...@apache.org> on 2019/08/27 10:25:00 UTC

[jira] [Commented] (IMPALA-8809) Refresh a subset of partitions for ACID tables

    [ https://issues.apache.org/jira/browse/IMPALA-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916598#comment-16916598 ] 

Gabor Kaszab commented on IMPALA-8809:
--------------------------------------

Update: This is probably not feasible as compactions done by Hive aren't reflected in writeId changes. So simply relying on either table-level or partition-level writeIds to decide if we have to refresh a table is not enough.
Currently there is nothing I can think of to improve the refreshing logic not to refresh all the partitions. One experiment would be to check how the timestamp fields like lastDdlTime (or something like that) behave. If we found them being updated either after compaction and after table schema modification than they could be used to determine the subset of partitions to refresh.

> Refresh a subset of partitions for ACID tables
> ----------------------------------------------
>
>                 Key: IMPALA-8809
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8809
>             Project: IMPALA
>          Issue Type: Improvement
>    Affects Versions: Impala 3.3.0
>            Reporter: Gabor Kaszab
>            Priority: Critical
>              Labels: impala-acid
>
> Enhancing REFRESH logic to handle ACID tables was covered by this change: https://issues.apache.org/jira/browse/IMPALA-8600
> Basically each user initiated REFRESH PARTITION is rejected meanwhile the REFRESH_PARTITION event in event processor are actually doing a full table load for ACID tables.
> There is room for improvement: When a full table refresh is being executed on an ACID table we can have 2 scenarios:
> - If there was some schema changes then reload the full table. Identify such a scenario should be possible by checking the table-level writeId. However, there is a bug in Hive that it doesn't update that field for partitioned tables (https://issues.apache.org/jira/browse/HIVE-22062). This would be the desired way but could also be workarounded by checking other fields lik lastDdlChanged or such.
> - If a full table refresh is not needed then we should fetch the partition-level writeIds and reload only the ones that are out-of-date locally.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org