You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Teddy Choi (JIRA)" <ji...@apache.org> on 2017/02/21 11:05:44 UTC
[jira] [Comment Edited] (HIVE-15824) Loop optimization for SIMD in
double comparisons
[ https://issues.apache.org/jira/browse/HIVE-15824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875794#comment-15875794 ]
Teddy Choi edited comment on HIVE-15824 at 2/21/17 11:05 AM:
-------------------------------------------------------------
I made a draft implementation with Double.doubleToRawLongBits. And it's slower than original. Maybe Unsafe.getLong can be an alternative.
Before
{noformat}
o.a.h.b.v.VectorizedComparisonBench.DoubleColEqualDoubleColumnBench.bench avgt 2 1114.343 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColEqualDoubleScalarBench.bench avgt 2 857.549 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColGreaterDoubleColumnBench.bench avgt 2 1410.078 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColGreaterDoubleScalarBench.bench avgt 2 705.738 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColGreaterEqualDoubleColumnBench.bench avgt 2 1134.613 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColGreaterEqualDoubleScalarBench.bench avgt 2 685.269 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColLessDoubleColumnBench.bench avgt 2 1248.419 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColLessDoubleScalarBench.bench avgt 2 664.593 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColLessEqualDoubleColumnBench.bench avgt 2 1048.175 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColLessEqualDoubleScalarBench.bench avgt 2 703.839 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColNotEqualDoubleColumnBench.bench avgt 2 1042.366 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColNotEqualDoubleScalarBench.bench avgt 2 823.484 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarEqualDoubleColumnBench.bench avgt 2 913.149 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarGreaterDoubleColumnBench.bench avgt 2 692.095 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarGreaterEqualDoubleColumnBench.bench avgt 2 3446.145 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarLessDoubleColumnBench.bench avgt 2 685.639 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarLessEqualDoubleColumnBench.bench avgt 2 683.063 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarNotEqualDoubleColumnBench.bench avgt 2 815.801 ± NaN ms/op
{noformat}
After
{noformat}
o.a.h.b.v.VectorizedComparisonBench.DoubleColEqualDoubleColumnBench.bench avgt 2 2042.958 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColEqualDoubleScalarBench.bench avgt 2 1448.942 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColGreaterDoubleColumnBench.bench avgt 2 1413.319 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColGreaterDoubleScalarBench.bench avgt 2 806.678 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColGreaterEqualDoubleColumnBench.bench avgt 2 1512.124 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColGreaterEqualDoubleScalarBench.bench avgt 2 968.015 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColLessDoubleColumnBench.bench avgt 2 1395.853 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColLessDoubleScalarBench.bench avgt 2 818.353 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColLessEqualDoubleColumnBench.bench avgt 2 1493.037 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColLessEqualDoubleScalarBench.bench avgt 2 946.533 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColNotEqualDoubleColumnBench.bench avgt 2 1860.316 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColNotEqualDoubleScalarBench.bench avgt 2 1356.694 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarEqualDoubleColumnBench.bench avgt 2 1607.872 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarGreaterDoubleColumnBench.bench avgt 2 790.920 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarGreaterEqualDoubleColumnBench.bench avgt 2 971.054 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarLessDoubleColumnBench.bench avgt 2 795.728 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarLessEqualDoubleColumnBench.bench avgt 2 958.495 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarNotEqualDoubleColumnBench.bench avgt 2 1496.665 ± NaN ms/op
{noformat}
was (Author: teddy.choi):
I made a draft implementation with Double.doubleToRawLongBits. And it's slower than original. Maybe Unsafe.getLong can be an alternative.
Before
{code}
o.a.h.b.v.VectorizedComparisonBench.DoubleColEqualDoubleColumnBench.bench avgt 2 1114.343 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColEqualDoubleScalarBench.bench avgt 2 857.549 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColGreaterDoubleColumnBench.bench avgt 2 1410.078 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColGreaterDoubleScalarBench.bench avgt 2 705.738 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColGreaterEqualDoubleColumnBench.bench avgt 2 1134.613 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColGreaterEqualDoubleScalarBench.bench avgt 2 685.269 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColLessDoubleColumnBench.bench avgt 2 1248.419 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColLessDoubleScalarBench.bench avgt 2 664.593 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColLessEqualDoubleColumnBench.bench avgt 2 1048.175 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColLessEqualDoubleScalarBench.bench avgt 2 703.839 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColNotEqualDoubleColumnBench.bench avgt 2 1042.366 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColNotEqualDoubleScalarBench.bench avgt 2 823.484 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarEqualDoubleColumnBench.bench avgt 2 913.149 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarGreaterDoubleColumnBench.bench avgt 2 692.095 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarGreaterEqualDoubleColumnBench.bench avgt 2 3446.145 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarLessDoubleColumnBench.bench avgt 2 685.639 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarLessEqualDoubleColumnBench.bench avgt 2 683.063 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarNotEqualDoubleColumnBench.bench avgt 2 815.801 ± NaN ms/op
{code}
After
{code}
o.a.h.b.v.VectorizedComparisonBench.DoubleColEqualDoubleColumnBench.bench avgt 2 2042.958 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColEqualDoubleScalarBench.bench avgt 2 1448.942 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColGreaterDoubleColumnBench.bench avgt 2 1413.319 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColGreaterDoubleScalarBench.bench avgt 2 806.678 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColGreaterEqualDoubleColumnBench.bench avgt 2 1512.124 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColGreaterEqualDoubleScalarBench.bench avgt 2 968.015 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColLessDoubleColumnBench.bench avgt 2 1395.853 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColLessDoubleScalarBench.bench avgt 2 818.353 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColLessEqualDoubleColumnBench.bench avgt 2 1493.037 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColLessEqualDoubleScalarBench.bench avgt 2 946.533 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColNotEqualDoubleColumnBench.bench avgt 2 1860.316 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleColNotEqualDoubleScalarBench.bench avgt 2 1356.694 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarEqualDoubleColumnBench.bench avgt 2 1607.872 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarGreaterDoubleColumnBench.bench avgt 2 790.920 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarGreaterEqualDoubleColumnBench.bench avgt 2 971.054 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarLessDoubleColumnBench.bench avgt 2 795.728 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarLessEqualDoubleColumnBench.bench avgt 2 958.495 ± NaN ms/op
o.a.h.b.v.VectorizedComparisonBench.DoubleScalarNotEqualDoubleColumnBench.bench avgt 2 1496.665 ± NaN ms/op
{code}
> Loop optimization for SIMD in double comparisons
> ------------------------------------------------
>
> Key: HIVE-15824
> URL: https://issues.apache.org/jira/browse/HIVE-15824
> Project: Hive
> Issue Type: Sub-task
> Reporter: Teddy Choi
> Assignee: Teddy Choi
>
> Use Double.doubleToRawLongBits, HIVE-11533 to optimize double comparisons.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)