You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2021/04/12 06:25:00 UTC
[jira] [Resolved] (HIVE-24746) PTF: TimestampValueBoundaryScanner
can be optimised during range computation
[ https://issues.apache.org/jira/browse/HIVE-24746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
László Bodor resolved HIVE-24746.
---------------------------------
Resolution: Fixed
> PTF: TimestampValueBoundaryScanner can be optimised during range computation
> ----------------------------------------------------------------------------
>
> Key: HIVE-24746
> URL: https://issues.apache.org/jira/browse/HIVE-24746
> Project: Hive
> Issue Type: Improvement
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Time Spent: 2h 20m
> Remaining Estimate: 0h
>
> During range computation, timestamp ranges become a hotspot due to "TimeStamp" comparisons. It has to construct the entire TimeStamp object via OI (which incurs LocalTime computation etc internally).
>
> All these are done for "equals" comparison which can be done with "seconds & nanoseconds" present in TimeStamp.
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java#L852]
>
> Request is to explore optimising this code path, so that equals() can be performed with "seconds/nanoseconds" instead of entire timestamp
>
> {noformat}
> at org.apache.hadoop.hive.common.type.Timestamp.setTimeInSeconds(Timestamp.java:133)
> at org.apache.hadoop.hive.serde2.io.TimestampWritableV2.populateTimestamp(TimestampWritableV2.java:401)
> at org.apache.hadoop.hive.serde2.io.TimestampWritableV2.getTimestamp(TimestampWritableV2.java:210)
> at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(PrimitiveObjectInspectorUtils.java:1239)
> at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(PrimitiveObjectInspectorUtils.java:1181)
> at org.apache.hadoop.hive.ql.udf.ptf.TimestampValueBoundaryScanner.isEqual(ValueBoundaryScanner.java:848)
> at org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.computeEndCurrentRow(ValueBoundaryScanner.java:593)
> at org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.computeEnd(ValueBoundaryScanner.java:530)
> at org.apache.hadoop.hive.ql.udf.ptf.BasePartitionEvaluator.getRange(BasePartitionEvaluator.java:273)
> at org.apache.hadoop.hive.ql.udf.ptf.BasePartitionEvaluator.iterate(BasePartitionEvaluator.java:219)
> at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.evaluateWindowFunction(WindowingTableFunction.java:147)
> at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.access$100(WindowingTableFunction.java:61)
> at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction$WindowingIterator.next(WindowingTableFunction.java:755)
> at org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:373)
> at org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:104)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:732)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:756)
> at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:383)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)