You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Aman Raj (Jira)" <ji...@apache.org> on 2022/11/17 07:23:00 UTC

[jira] [Updated] (HIVE-19416) Create single version transactional table metastore statistics for aggregation queries

     [ https://issues.apache.org/jira/browse/HIVE-19416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aman Raj updated HIVE-19416:
----------------------------
    Fix Version/s: 3.2.0

> Create single version transactional table metastore statistics for aggregation queries
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-19416
>                 URL: https://issues.apache.org/jira/browse/HIVE-19416
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>            Reporter: Steve Yeom
>            Assignee: Steve Yeom
>            Priority: Major
>             Fix For: 3.2.0, 4.0.0-alpha-1
>
>
> This adds accurate stats support to Hive transactional (insert-only and full ACID) tables, so that some queries from these tables can be answered from stats and also so that the stats could be used for more optimization. This support can be enabled via a config flag, and is on by default.
> This is achieved via the following changes, that basically start us on the path of treating ACID stats the same way we treat ACID data:
> In addition to existing JSON blob, we store a write ID of the latest stats writer with each table and partition. Any writer updating the stats or altering the stats state for a txn table has to record his write ID - if the write ID and some other context is not provided by the caller of alter, the alter table/partition operation fails. It's the responsibility of the writer to not commit its transaction if the operation fails.
> In future, we'd like to move the stats state into actual stats tables, but for now it's (logically) colocated with the existing json parameter.
> In addition to its write ID, most callers (with the exception of the ones that cannot have races, e.g. create table) provide their own txn state (write ID list) for the table. The existing stats' write ID is verified against this state. If the write ID is not visible to the updater, we still update the stats, but set the stats state to invalid; that basically means that two parallel operations that cannot see each others' data output are updating the stats.
> This way, txn stats stay valid as the result of a sequence of non-conflicting updates that can all see each other and account for each others' data for the stats. Any parallel updates invalidate the stats.
> This is necessary because unlike data, stats are a single version summary of the table. To be able to support parallel operations with valid stats, each stats update would need to write a separate record, and would also need to write mergeable records that only reflect its own changes, instead of the final view of the table stats (two of those are hard or impossible to merge).
> This approach resulted in a few changes to alter/etc APIs in metastore; it also requires that many alter operations, as well as analyze table, allocate a write ID (because they affect stats-that-are-treated-like-data, and so are essentially a write operation).
> The reader, in turn, verifies that the stats are valid and written by a write ID that is itself valid given the reader's transactional state (i.e. not aborted, nor in progress). This is done on metastore side; if the stats are invalid for the reader, we transparently update the stats state returned to the caller to mark the stats as inaccurate.
> We've considered (and actually implemented) an alternative approach of recording the full txn state of the stats writer to be compared with the state of the stats reader (to see if they are compatible and avoid the extra write IDs and strict write-time checks), however it results in problems with partitioned tables, where not all writes affect all partitions, and so the stats state of all the untouched partitions becomes invalid once a subset of partitions is updated (because we cannot tell whether the write ID, a table level operation, didn't touch the partition, or did touch it but didn't record the stats). Additionally, storing full txn state for every partition and table can be expensive, especially in extreme cases where the watermark doesn't advance for a while for some reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)