You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Marton Bod (Jira)" <ji...@apache.org> on 2022/01/12 13:38:00 UTC

[jira] [Resolved] (HIVE-25843) Add flag to disable Iceberg FileIO config serialization

     [ https://issues.apache.org/jira/browse/HIVE-25843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marton Bod resolved HIVE-25843.
-------------------------------
    Resolution: Fixed

> Add flag to disable Iceberg FileIO config serialization
> -------------------------------------------------------
>
>                 Key: HIVE-25843
>                 URL: https://issues.apache.org/jira/browse/HIVE-25843
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Marton Bod
>            Assignee: Marton Bod
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Hive serializes the Iceberg table object into each individual split. Since the FileIO is part of the Iceberg table and it has its own hadoop configuration, this configuration will be the dominant factor determining the size of the serialized split. In our tests we have found that due to this serialized config, iceberg splits are 15-20x larger than normal Hive splits (which led to OOM in some of our perf tests).
> This PR proposes to introduce a config which can turn off this config serialization, and let the deserializer-side fill out the config values instead (which works for Hive executors, since they have all the config values in hand). This can reduce the Iceberg split size by ~20x based on local tests.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)