You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/11/14 00:26:00 UTC

[jira] [Work logged] (HIVE-26734) Iceberg: Add an option to allow positional delete files without actual row data

     [ https://issues.apache.org/jira/browse/HIVE-26734?focusedWorklogId=825613&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-825613 ]

ASF GitHub Bot logged work on HIVE-26734:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Nov/22 00:25
            Start Date: 14/Nov/22 00:25
    Worklog Time Spent: 10m 
      Work Description: ayushtkn opened a new pull request, #3758:
URL: https://github.com/apache/hive/pull/3758

   ### What changes were proposed in this pull request?
   
   Allow writing actual row data in delete file as optional
   
   ### Why are the changes needed?
   
   To avoid cost of reading/writing actual row data while reading/write delete files
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   
   ### How was this patch tested?
   
   Added UT to ensure functionality. Tested File Contents and operations in actual env. 
   
   **Output: (With Config Enabled)**
   ```
   [root@ayushsaxena-2 ~]# sudo -u hive hive --orcfiledump -d hdfs://cluster:8020/warehouse/tablespace/external/hive/ice02/data/00000-0-delete-hive_20221112171847_aba7a24a-68ec-46fa-bed4-3588f4a92e74-job_16680670596081_0052-1-00001.orc
   Processing data file hdfs://cluster:8020/warehouse/tablespace/external/hive/ice02/data/00000-0-delete-hive_20221112171847_aba7a24a-68ec-46fa-bed4-3588f4a92e74-job_16680670596081_0052-1-00001.orc [length: 1075]
   {"file_path":"hdfs:\/\/cluster:8020\/warehouse\/tablespace\/external\/hive\/ice02\/data\/00000-0-data-hive_20221112171758_ff81cb9a-d455-455d-8080-9b1507f33bd8-job_16680670596080_0052-2-00001.orc","pos":3}
   ```
    **Output:(With config Disabled)**
   ```
   [root@ayushsaxena-2 ~]#  sudo -u hive hive --orcfiledump -d  hdfs://cluster:8020/warehouse/tablespace/external/hive/ice01/data/00000-0-delete-hive_20221112171823_8817d25c-3178-4f4b-9908-be794029cbce-job_16680670596081_0052-1-00001.orc
   Processing data file hdfs://cluster:8020/warehouse/tablespace/external/hive/ice01/data/00000-0-delete-hive_20221112171823_8817d25c-3178-4f4b-9908-be794029cbce-job_16680670596081_0052-1-00001.orc [length: 1178]
   {"file_path":"hdfs:\/\/cluster:8020\/warehouse\/tablespace\/external\/hive\/ice01\/data\/00000-0-data-hive_20221112171727_18104d4c-f5a2-4c47-8f6e-b8e4f17a18b3-job_16680670596080_0052-1-00001.orc","pos":3,"row":{"id":4}}
   ```




Issue Time Tracking
-------------------

            Worklog Id:     (was: 825613)
    Remaining Estimate: 0h
            Time Spent: 10m

> Iceberg: Add an option to allow positional delete files without actual row data
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-26734
>                 URL: https://issues.apache.org/jira/browse/HIVE-26734
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Allow an option to have actual row data in the Iceberg PositionalDelete file as optional, to avoid reading and writing huge amount of actual row data during query executions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)