You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/01/20 14:50:00 UTC

[jira] [Work logged] (HIVE-24670) DeleteReaderValue should not allocate empty vectors for delete delta files

     [ https://issues.apache.org/jira/browse/HIVE-24670?focusedWorklogId=538476&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538476 ]

ASF GitHub Bot logged work on HIVE-24670:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 20/Jan/21 14:49
            Start Date: 20/Jan/21 14:49
    Worklog Time Spent: 10m 
      Work Description: szlta opened a new pull request #1894:
URL: https://github.com/apache/hive/pull/1894


   If delete delta caching is turned off, the plain record reader inside DeleteReaderValue allocates a batch with a schema that is equivalent to that of an insert delta.
   
   This is unnecessary as the struct part in a delete delta file is always empty. In cases where we have many delete delta files (e.g. due to compaction failures) and a wide table definition (e.g. 200+ cols) this puts a significant amount of memory pressure on the executor, while these empty structures will never be filled or otherwise utilized.
   
   I propose we specify an ACID schema with an empty struct part to this record reader to counter this.
   
   Options


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 538476)
    Remaining Estimate: 0h
            Time Spent: 10m

> DeleteReaderValue should not allocate empty vectors for delete delta files
> --------------------------------------------------------------------------
>
>                 Key: HIVE-24670
>                 URL: https://issues.apache.org/jira/browse/HIVE-24670
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ádám Szita
>            Assignee: Ádám Szita
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> If delete delta caching is turned off, the plain record reader inside DeleteReaderValue allocates a batch with a schema that is equivalent to that of an insert delta.
> This is unnecessary as the struct part in a delete delta file is always empty. In cases where we have many delete delta files (e.g. due to compaction failures) and a wide table definition (e.g. 200+ cols) this puts a significant amount of memory pressure on the executor, while these empty structures will never be filled or otherwise utilized.
> I propose we specify an ACID schema with an empty struct part to this record reader to counter this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)