You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/01/04 12:13:00 UTC

[jira] [Work logged] (HIVE-25842) Reimplement delta file metric collection

     [ https://issues.apache.org/jira/browse/HIVE-25842?focusedWorklogId=703271&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703271 ]

ASF GitHub Bot logged work on HIVE-25842:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Jan/22 12:12
            Start Date: 04/Jan/22 12:12
    Worklog Time Spent: 10m 
      Work Description: laszlocsabapinter opened a new pull request #2915:
URL: https://github.com/apache/hive/pull/2915


   ### What changes were proposed in this pull request?
   Move delta metric collection from Tez side to compaction side.  All delta file metrics are collected during initiator, worker and cleaner phase. 
   
   ### Why are the changes needed?
   Metrics are collected only when a Tez query runs a table (select * and select count( * ) don't update the metrics)
   Metrics aren't updated after compaction or cleaning after compaction, so users will probably see "issues" with compaction (like many active or obsolete or small deltas) that don't exist.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Manual test, unit test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 703271)
    Remaining Estimate: 0h
            Time Spent: 10m

> Reimplement delta file metric collection
> ----------------------------------------
>
>                 Key: HIVE-25842
>                 URL: https://issues.apache.org/jira/browse/HIVE-25842
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Pintér
>            Assignee: László Pintér
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> FUNCTIONALITY: Metrics are collected only when a Tez query runs a table (select * and select count( * ) don't update the metrics)
> Metrics aren't updated after compaction or cleaning after compaction, so users will probably see "issues" with compaction (like many active or obsolete or small deltas) that don't exist.
> RISK: Metrics are collected during queries – we tried to put a try-catch around each method in DeltaFilesMetricsReporter but of course this isn't foolproof. This is a HUGE performance and functionality liability. Tests caught some issues, but our tests aren't perfect.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)