You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sanjar Akhmedov (Jira)" <ji...@apache.org> on 2019/11/11 13:27:00 UTC
[jira] [Updated] (HIVE-22477) Avro logical type timestamp
conversion is slow
[ https://issues.apache.org/jira/browse/HIVE-22477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sanjar Akhmedov updated HIVE-22477:
-----------------------------------
Attachment: flamegraph.svg
> Avro logical type timestamp conversion is slow
> ----------------------------------------------
>
> Key: HIVE-22477
> URL: https://issues.apache.org/jira/browse/HIVE-22477
> Project: Hive
> Issue Type: Improvement
> Affects Versions: 3.1.0
> Environment: Hive 3.1.0
> Reporter: Sanjar Akhmedov
> Priority: Major
> Labels: Performance
> Attachments: flamegraph.svg
>
>
> We have an avro backed table with hundreds of billions timestamps. Simple {{SELECT COUNT(*) FROM t}} query takes many hours to complete in version 3.1.0 versus tens of minutes in version 1.2.1.
> Looking at the attached flamegraph of one of the yarn containers, hive is spending most of the time throwing exceptions during avro timestamp conversion.
> It is generally good idea to avoid throwing exceptions in performance critical sections, as exception creation is an expensive operation, and potentially repeating for many rows/values in a query can have drastic performance implications.
> Afaics there is no reason to convert numeric timestamp to a string and enter very lenient {{org.apache.hadoop.hive.common.type.TimestampTZUtil#parse(java.lang.String, java.time.ZoneId)}} to do timezone conversion.
> This patch changes the conversion of {{Date}} and {{Timestamp}} to {{TimestampTZ}} such that it doesn't invoke {{parse}}.
> JMH timings before:
> {code:java}
> Benchmark Mode Cnt Score Error Units
> TimestampTZUtilBench.convertDate avgt 2 10091.990 ns/op
> TimestampTZUtilBench.convertTimestamp avgt 2 10657.596 ns/op
> {code}
> JMH timings after:
> {code:java}
> Benchmark Mode Cnt Score Error Units
> TimestampTZUtilBench.convertDate avgt 2 48.371 ns/op
> TimestampTZUtilBench.convertTimestamp avgt 2 51.170 ns/op
> {code}
> JMH stack profile before:
> {code:java}
> Secondary result "org.apache.hive.benchmark.common.TimestampTZUtilBench.convertDate:·stack":
> Stack profiler:
> ....[Thread state distributions]....................................................................
> 100.0% RUNNABLE
> ....[Thread state: RUNNABLE]........................................................................
> 97.4% 97.4% java.lang.Throwable.fillInStackTrace
> 1.6% 1.6% java.time.format.DateTimeFormatter.parse
> 0.2% 0.2% java.time.ZoneId.from
> 0.1% 0.1% java.util.HashMap.hash
> 0.1% 0.1% java.lang.Number.<init>
> 0.1% 0.1% java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format
> 0.1% 0.1% java.lang.StringBuilder.append
> 0.1% 0.1% java.util.HashMap.putVal
> 0.1% 0.1% java.lang.String.valueOf
> 0.1% 0.1% java.util.regex.Pattern$BmpCharProperty.match
> 0.2% 0.2% <other>
> ...
> Secondary result "org.apache.hive.benchmark.common.TimestampTZUtilBench.convertTimestamp:·stack":
> Stack profiler:
> ....[Thread state distributions]....................................................................
> 100.0% RUNNABLE
> ....[Thread state: RUNNABLE]........................................................................
> 96.5% 96.5% java.lang.Throwable.fillInStackTrace
> 1.0% 1.0% java.time.format.DateTimeFormatter.parse
> 0.6% 0.6% org.apache.hadoop.hive.common.type.TimestampTZUtil.parse
> 0.4% 0.4% java.time.ZoneId.from
> 0.2% 0.2% java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format
> 0.2% 0.2% java.time.format.Parsed.resolveFields
> 0.2% 0.2% java.lang.String.valueOf
> 0.1% 0.1% java.lang.StringBuilder.append
> 0.1% 0.1% java.util.HashMap.hash
> 0.1% 0.1% java.time.format.DateTimeParseContext.toResolved
> 0.6% 0.6% <other>
> {code}
> JMH stack profile after:
> {code:java}
> Secondary result "org.apache.hive.benchmark.common.TimestampTZUtilBench.convertDate:·stack":
> Stack profiler:
> ....[Thread state distributions]....................................................................
> 100.0% RUNNABLE
> ....[Thread state: RUNNABLE]........................................................................
> 91.6% 91.6% java.time.ZonedDateTime.ofInstant
> 8.0% 8.0% org.apache.hive.benchmark.common.generated.TimestampTZUtilBench_convertDate_jmhTest.convertDate_avgt_jmhStub
> 0.1% 0.1% java.time.zone.ZoneRules.<init>
> 0.1% 0.1% java.time.LocalDateTime.ofEpochSecond
> 0.1% 0.1% org.apache.hadoop.hive.common.type.TimestampTZUtil.convert
> 0.1% 0.1% java.time.LocalDate.ofEpochDay
> 0.1% 0.1% java.time.ZonedDateTime.create
> ...
> Secondary result "org.apache.hive.benchmark.common.TimestampTZUtilBench.convertTimestamp:·stack":
> Stack profiler:
> ....[Thread state distributions]....................................................................
> 100.0% RUNNABLE
> ....[Thread state: RUNNABLE]........................................................................
> 90.7% 90.7% java.time.ZonedDateTime.ofInstant
> 9.0% 9.0% org.apache.hive.benchmark.common.generated.TimestampTZUtilBench_convertTimestamp_jmhTest.convertTimestamp_avgt_jmhStub
> 0.1% 0.1% java.time.zone.ZoneRules.<init>
> 0.1% 0.1% org.apache.hive.benchmark.common.generated.TimestampTZUtilBench_convertTimestamp_jmhTest.convertTimestamp_AverageTime
> 0.1% 0.1% java.time.LocalDateTime.ofEpochSecond
> 0.1% 0.1% java.time.LocalDate.ofEpochDay
> 0.1% 0.1% java.time.ZonedDateTime.create
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)