You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2015/04/07 09:34:12 UTC
[jira] [Commented] (HIVE-10180) Loop optimization for SIMD in ColumnArithmeticColumn.txt

    [ https://issues.apache.org/jira/browse/HIVE-10180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482746#comment-14482746 ] 

Gopal V commented on HIVE-10180:
--------------------------------

[~chengxiang li]: patch looks self-explanatory, waiting for test runs.

I tested bigint+bigint on TPC-H to test this out & I'm seeing better assembly being generated in the inner loop (but not AVX2, maybe I need a later JDK8?).

{code}
    7fa5b6eced29:       cmp    %r8d,%r10d
    7fa5b6eced2c:       jge    0x7fa5b6eced8a
    7fa5b6eced2e:       xchg   %ax,%ax
    7fa5b6eced30:       vmovdqu 0x10(%r11,%r10,8),%xmm0
    7fa5b6eced37:       vpaddq 0x10(%rdx,%r10,8),%xmm0,%xmm0
    7fa5b6eced3e:       vmovdqu %xmm0,0x10(%rcx,%r10,8)
    7fa5b6eced45:       movslq %r10d,%rsi
    7fa5b6eced48:       vmovdqu 0x20(%r11,%rsi,8),%xmm0
    7fa5b6eced4f:       vpaddq 0x20(%rdx,%rsi,8),%xmm0,%xmm0
    7fa5b6eced55:       vmovdqu %xmm0,0x20(%rcx,%rsi,8)
    7fa5b6eced5b:       vmovdqu 0x30(%r11,%rsi,8),%xmm0
    7fa5b6eced62:       vpaddq 0x30(%rdx,%rsi,8),%xmm0,%xmm0
    7fa5b6eced68:       vmovdqu %xmm0,0x30(%rcx,%rsi,8)
    7fa5b6eced6e:       vmovdqu 0x40(%r11,%rsi,8),%xmm0
    7fa5b6eced75:       vpaddq 0x40(%rdx,%rsi,8),%xmm0,%xmm0
    7fa5b6eced7b:       vmovdqu %xmm0,0x40(%rcx,%rsi,8)
    7fa5b6eced81:       add    $0x8,%r10d
    7fa5b6eced85:       cmp    %r8d,%r10d
    7fa5b6eced88:       jl     0x7fa5b6eced30
    7fa5b6eced8a:       cmp    0x14(%rsp),%r10d
    7fa5b6eced8f:       je     0x7fa5b6ecedc8
{code}

Looks like there's a branch-miss for the jl back to the beginning of the loop.

Trying to get a linux perf cycle count of this, to confirm if that's actually real.

> Loop optimization for SIMD in ColumnArithmeticColumn.txt
> --------------------------------------------------------
>
>                 Key: HIVE-10180
>                 URL: https://issues.apache.org/jira/browse/HIVE-10180
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>            Priority: Minor
>         Attachments: HIVE-10180.1.patch, HIVE-10180.2.patch
>
>
> JVM is quite strict on the code schema which may executed with SIMD instructions, take a loop in DoubleColAddDoubleColumn.java for example, 
> {code:java}
> for (int i = 0; i != n; i++) {
>   outputVector[i] = vector1[0] + vector2[i];
> }
> {code}
> The "vector1[0]" reference would prevent JVM to execute this part of code with vectorized instructions, we need to assign the "vector1[0]" to a variable outside of loop, and use that variable in loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)