You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Ádám Szita (Jira)" <ji...@apache.org> on 2021/10/28 13:13:00 UTC

[jira] [Resolved] (HIVE-25628) Avoid unnecessary file ops if Iceberg table is LLAP cached

     [ https://issues.apache.org/jira/browse/HIVE-25628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ádám Szita resolved HIVE-25628.
-------------------------------
    Fix Version/s: 4.0.0
       Resolution: Fixed

Committed to master. Thanks for the review [~mbod]!

> Avoid unnecessary file ops if Iceberg table is LLAP cached
> ----------------------------------------------------------
>
>                 Key: HIVE-25628
>                 URL: https://issues.apache.org/jira/browse/HIVE-25628
>             Project: Hive
>          Issue Type: Improvement
>          Components: llap
>            Reporter: Ádám Szita
>            Assignee: Ádám Szita
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In case the query execution is vectorized for an Iceberg table, we need to make an extra file open operation on the ORC file to learn what the file schema is (to be matched later with the logical schema).
> In LLAP configuration the file schema could be retrieved through LLAP cache as ORC metadata is cached, so we should avoid the file operation when possible.
> Also: LLAP relies on cache keys that are usually triplets of file information and is constructed by an FS.listStatus call. For iceberg tables we should rely on such file information provided by Iceberg's metadata to spare this call too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)