You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2021/04/12 06:25:00 UTC
[jira] [Resolved] (HIVE-24746) PTF: TimestampValueBoundaryScanner can be optimised during range computation

     [ https://issues.apache.org/jira/browse/HIVE-24746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

László Bodor resolved HIVE-24746.
---------------------------------
    Resolution: Fixed

> PTF: TimestampValueBoundaryScanner can be optimised during range computation
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-24746
>                 URL: https://issues.apache.org/jira/browse/HIVE-24746
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> During range computation, timestamp ranges become a hotspot due to "TimeStamp" comparisons. It has to construct the entire TimeStamp object via OI (which incurs LocalTime computation etc internally).
>  
> All these are done for "equals" comparison which can be done with "seconds & nanoseconds" present in TimeStamp.
>  
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java#L852] 
>  
> Request is to explore optimising this code path, so that equals() can be performed with "seconds/nanoseconds" instead of entire timestamp
>  
> {noformat}
> at org.apache.hadoop.hive.common.type.Timestamp.setTimeInSeconds(Timestamp.java:133)
> 	at org.apache.hadoop.hive.serde2.io.TimestampWritableV2.populateTimestamp(TimestampWritableV2.java:401)
> 	at org.apache.hadoop.hive.serde2.io.TimestampWritableV2.getTimestamp(TimestampWritableV2.java:210)
> 	at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(PrimitiveObjectInspectorUtils.java:1239)
> 	at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(PrimitiveObjectInspectorUtils.java:1181)
> 	at org.apache.hadoop.hive.ql.udf.ptf.TimestampValueBoundaryScanner.isEqual(ValueBoundaryScanner.java:848)
> 	at org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.computeEndCurrentRow(ValueBoundaryScanner.java:593)
> 	at org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.computeEnd(ValueBoundaryScanner.java:530)
> 	at org.apache.hadoop.hive.ql.udf.ptf.BasePartitionEvaluator.getRange(BasePartitionEvaluator.java:273)
> 	at org.apache.hadoop.hive.ql.udf.ptf.BasePartitionEvaluator.iterate(BasePartitionEvaluator.java:219)
> 	at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.evaluateWindowFunction(WindowingTableFunction.java:147)
> 	at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.access$100(WindowingTableFunction.java:61)
> 	at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction$WindowingIterator.next(WindowingTableFunction.java:755)
> 	at org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:373)
> 	at org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:104)
> 	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:732)
> 	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:756)
> 	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:383)
> 	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
> 	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)