You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/07/04 08:35:00 UTC

[jira] [Work logged] (HIVE-23764) Remove unnecessary getLastFlushLength when checking delete delta files

     [ https://issues.apache.org/jira/browse/HIVE-23764?focusedWorklogId=454517&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454517 ]

ASF GitHub Bot logged work on HIVE-23764:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Jul/20 08:34
            Start Date: 04/Jul/20 08:34
    Worklog Time Spent: 10m 
      Work Description: pvary merged pull request #1185:
URL: https://github.com/apache/hive/pull/1185


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 454517)
    Time Spent: 20m  (was: 10m)

> Remove unnecessary getLastFlushLength when checking delete delta files
> ----------------------------------------------------------------------
>
>                 Key: HIVE-23764
>                 URL: https://issues.apache.org/jira/browse/HIVE-23764
>             Project: Hive
>          Issue Type: Improvement
>          Components: Transactions
>            Reporter: Peter Vary
>            Assignee: Peter Vary
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry calls OrcAcidUtils.getLastFlushLength for every delete delta file.
> Even the comment says:
> {code}
>               // NOTE: Calling last flush length below is more for future-proofing when we have
>               // streaming deletes. But currently we don't support streaming deletes, and this can
>               // be removed if this becomes a performance issue.
> {code}
> If we have a table with 5 updates (1 base + 5 delta + 5 delete_delta), then for every base + delta dir we will check all of the delete_delta directories, and check the getLastFlushLength method which will result in 6*5=30 unnecessary NN/S3 calls.
> We should remove the check as already proposed in the comment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)