You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Zoltán Borók-Nagy (Jira)" <ji...@apache.org> on 2023/05/18 08:16:00 UTC

[jira] [Updated] (HIVE-27356) Hive should write name of blob type instead of table name in Puffin

     [ https://issues.apache.org/jira/browse/HIVE-27356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zoltán Borók-Nagy updated HIVE-27356:
-------------------------------------
    Summary: Hive should write name of blob type instead of table name in Puffin  (was: Hive should write name of blob type instead of table name in Puffing)

> Hive should write name of blob type instead of table name in Puffin
> -------------------------------------------------------------------
>
>                 Key: HIVE-27356
>                 URL: https://issues.apache.org/jira/browse/HIVE-27356
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Zoltán Borók-Nagy
>            Priority: Major
>
> Currently Hive writes the name of the table plus snapshot id as blob type:
> [https://github.com/apache/hive/blob/aa1e067033ef0b5468f725cfd3776810800af96d/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L422]
> Instead, it should write the name of the blog it writes. Table name and snapshot id are redundant information anyway, as they can be inferred from the location and filename of the puffin file.
> Currently it writes a non-standard blob (Standard blob types are listed [here|https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java]). I think it would be better to write standard blobs for interoperability. But if Hive wants to write non-standard blobs anyway, it should still come up with a descriptive name for them, e.g. 'hive-column-statistics-v1'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)