You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "David Mollitor (Jira)" <ji...@apache.org> on 2019/11/01 14:40:00 UTC

[jira] [Commented] (HIVE-22402) Deprecate and Replace Hive PerfLogger

    [ https://issues.apache.org/jira/browse/HIVE-22402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964876#comment-16964876 ] 

David Mollitor commented on HIVE-22402:
---------------------------------------

Hello watchers.  Any thoughts on this? :)

> Deprecate and Replace Hive PerfLogger
> -------------------------------------
>
>                 Key: HIVE-22402
>                 URL: https://issues.apache.org/jira/browse/HIVE-22402
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 4.0.0
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Major
>         Attachments: HIVE-22402.1.patch, HIVE-22402.2.patch, HIVE-22402.3.patch, HIVE-22402.4.patch
>
>
> Recently I wanted to add some additional capability, and add more, performance logging to support my troubleshooting efforts. I started looking at PerfLogger and started to examine its usage. I discovered a few things:
>  # Since 'loggers' must be open and closed manually, I found a couple of places where loggers were opened, but not closed, rendering them useless
>  # Since 'loggers' must be closed manually, I found a few places where an early-return or Exception thrown would cause a logger to not be closed, thereby rendering it useless
>  # Session information is not logged, so it can be difficult to precisely pinpoint which session is taking lots of time
>  # PerfLogger overloaded. Most of the time, it's being used as a simple timer mechanism with automatic logging in SLF4J debug. However, it is also a facade over the Hive Metrics subsystem and timing results are automatically published to Metrics and then there becomes this dependency on a 'logger' to be able to access metric data as well.
> The last bullet is the most challenging part and why I propose to deprecate the Hive {{PerfLogger}} and not simply remove it. I am proposing a new system... a {{PerfTimer}} that is allows for Java 8's try-with-resources feature to protect against the developer having to care about manually close measurements and not having to carefully consider all early-exits. The base implementation logs to SLF4J. An extended version automatically publishes to the Hive Metric subsystem as well.
> The current Hive {{PerfLogger}} has a bit of a clunky system for allowing plugable implementations. However, the current default implementation has a side-effect of also publishing timing information to the Hive Metrics subsystem. There are code sections that look up various timers in the Metrics Subsytem and publish the results back to the client. Since, in theory, the implementation is plugable, any other implementation that does not also have this side-effect of also publishing to the Metrics Subsystem will break these non-optional code paths.  Also, these code paths create and interact with {{PerfLoggers in a static way, and then the publishing code pulls the data from the {{PerfLogger}}}} (as a facade to the Metrics subsystem) in a static way. Therefore, when I tried to replace the entire {{PerfLogger}} code, I came across an issue because there is not (and should not) be a way to just statically pull this information down from any point in the code. Information that is required for publishing should be passed around within some sort of context object, separate from the Metrics subsystem. There was no obvious way to string a new {{PerfTimer}} to all the required locations. I propose marking the {{PerfLogger}} as deprecated and leaving these complex section alone. Instead, replace only the simple "I want a timer" use cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)