You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@harmony.apache.org by "Vladimir Strigun (JIRA)" <ji...@apache.org> on 2008/03/12 12:42:46 UTC

[jira] Created: (HARMONY-5599) [drlvm][jit][performance] new movapd instruction for register-to-register copy

[drlvm][jit][performance] new movapd instruction for register-to-register copy
------------------------------------------------------------------------------

                 Key: HARMONY-5599
                 URL: https://issues.apache.org/jira/browse/HARMONY-5599
             Project: Harmony
          Issue Type: Improvement
          Components: DRLVM
            Reporter: Vladimir Strigun


Usage of movapd instruction for copy between xmm registers is more efficient in copmarison with partial movsd copy. So, attached patch replace movsd with movadp instruction for such operations. 
I've checked the patch on scimark bench [1] and got the following results (about 15% speedup for composite score):

orig build:

SciMark 2.0a

Composite Score: 236.8043350027899
FFT (1024): 266.4183025101507
SOR (100x100):   410.3833460433766
Monte Carlo : 31.43640457526972
Sparse matmult (N=1000, nz=5000): 208.14991492655557
LU (100x100): 267.6337069585971

with movapd:

SciMark 2.0a

Composite Score: 271.62584550328904
FFT (1024): 296.11079189672955
SOR (100x100):   458.00820213602486
Monte Carlo : 31.406979573247035
Sparse matmult (N=1000, nz=5000): 208.14991492655557
LU (100x100): 364.453338983888


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HARMONY-5599) [drlvm][jit][performance] new movapd instruction for register-to-register copy

Posted by "Mikhail Fursov (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HARMONY-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578331#action_12578331 ] 

Mikhail Fursov commented on HARMONY-5599:
-----------------------------------------

The patch needs to be fixed.
Reason:
AFAIK MOVAPD is SSE2 instruction. So we need to use old (SSE) way if SSE2 is not available.

Check Ia32i586InstsExpansion code to see how CPUID check is performed.

> [drlvm][jit][performance] new movapd instruction for register-to-register copy
> ------------------------------------------------------------------------------
>
>                 Key: HARMONY-5599
>                 URL: https://issues.apache.org/jira/browse/HARMONY-5599
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Vladimir Strigun
>            Assignee: Mikhail Fursov
>         Attachments: HARMONY-5599.patch
>
>
> Usage of movapd instruction for copy between xmm registers is more efficient in copmarison with partial movsd copy. So, attached patch replace movsd with movadp instruction for such operations. 
> I've checked the patch on scimark bench [1] and got the following results (about 15% speedup for composite score):
> orig build:
> SciMark 2.0a
> Composite Score: 236.8043350027899
> FFT (1024): 266.4183025101507
> SOR (100x100):   410.3833460433766
> Monte Carlo : 31.43640457526972
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 267.6337069585971
> with movapd:
> SciMark 2.0a
> Composite Score: 271.62584550328904
> FFT (1024): 296.11079189672955
> SOR (100x100):   458.00820213602486
> Monte Carlo : 31.406979573247035
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 364.453338983888

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HARMONY-5599) [drlvm][jit][performance] new movapd instruction for register-to-register copy

Posted by "Mikhail Fursov (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HARMONY-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Fursov updated HARMONY-5599:
------------------------------------

    Attachment: i586.diff

Bad news everyone, we have a problem with MOVAPD
please check "ant reg.test -Dtest.case=H1578"


note that if I comment MOVAPD usage in movapd.diff :

    if (false && CPUID::isSSE2Supported()) {
                return newInst(Mnemonic_MOVAPD, targetOpnd, sourceOpnd);
            } else  {
                return newInst(Mnemonic_MOVSD, targetOpnd, sourceOpnd);
            }
        

the test passes.



I'm going to commit i586 patch tomorrow only. This patch does not contains MOVSD->MOVAPD replacement



> [drlvm][jit][performance] new movapd instruction for register-to-register copy
> ------------------------------------------------------------------------------
>
>                 Key: HARMONY-5599
>                 URL: https://issues.apache.org/jira/browse/HARMONY-5599
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Vladimir Strigun
>            Assignee: Mikhail Fursov
>         Attachments: H-5599.patch, i586.diff, movapd.diff
>
>
> Usage of movapd instruction for copy between xmm registers is more efficient in copmarison with partial movsd copy. So, attached patch replace movsd with movadp instruction for such operations. 
> I've checked the patch on scimark bench [1] and got the following results (about 15% speedup for composite score):
> orig build:
> SciMark 2.0a
> Composite Score: 236.8043350027899
> FFT (1024): 266.4183025101507
> SOR (100x100):   410.3833460433766
> Monte Carlo : 31.43640457526972
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 267.6337069585971
> with movapd:
> SciMark 2.0a
> Composite Score: 271.62584550328904
> FFT (1024): 296.11079189672955
> SOR (100x100):   458.00820213602486
> Monte Carlo : 31.406979573247035
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 364.453338983888

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HARMONY-5599) [drlvm][jit][performance] new movapd instruction for register-to-register copy

Posted by "Vladimir Strigun (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HARMONY-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580365#action_12580365 ] 

Vladimir Strigun commented on HARMONY-5599:
-------------------------------------------

Thanks Mikhail. Speedup for scimark is the same as for 1st version. I believe the issues should be closed. 

> [drlvm][jit][performance] new movapd instruction for register-to-register copy
> ------------------------------------------------------------------------------
>
>                 Key: HARMONY-5599
>                 URL: https://issues.apache.org/jira/browse/HARMONY-5599
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Vladimir Strigun
>            Assignee: Mikhail Fursov
>         Attachments: H-5599.patch, i586.diff, movapd.diff, movapd2.diff
>
>
> Usage of movapd instruction for copy between xmm registers is more efficient in copmarison with partial movsd copy. So, attached patch replace movsd with movadp instruction for such operations. 
> I've checked the patch on scimark bench [1] and got the following results (about 15% speedup for composite score):
> orig build:
> SciMark 2.0a
> Composite Score: 236.8043350027899
> FFT (1024): 266.4183025101507
> SOR (100x100):   410.3833460433766
> Monte Carlo : 31.43640457526972
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 267.6337069585971
> with movapd:
> SciMark 2.0a
> Composite Score: 271.62584550328904
> FFT (1024): 296.11079189672955
> SOR (100x100):   458.00820213602486
> Monte Carlo : 31.406979573247035
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 364.453338983888

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HARMONY-5599) [drlvm][jit][performance] new movapd instruction for register-to-register copy

Posted by "Vladimir Strigun (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HARMONY-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vladimir Strigun updated HARMONY-5599:
--------------------------------------

    Attachment: HARMONY-5599.patch

> [drlvm][jit][performance] new movapd instruction for register-to-register copy
> ------------------------------------------------------------------------------
>
>                 Key: HARMONY-5599
>                 URL: https://issues.apache.org/jira/browse/HARMONY-5599
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Vladimir Strigun
>         Attachments: HARMONY-5599.patch
>
>
> Usage of movapd instruction for copy between xmm registers is more efficient in copmarison with partial movsd copy. So, attached patch replace movsd with movadp instruction for such operations. 
> I've checked the patch on scimark bench [1] and got the following results (about 15% speedup for composite score):
> orig build:
> SciMark 2.0a
> Composite Score: 236.8043350027899
> FFT (1024): 266.4183025101507
> SOR (100x100):   410.3833460433766
> Monte Carlo : 31.43640457526972
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 267.6337069585971
> with movapd:
> SciMark 2.0a
> Composite Score: 271.62584550328904
> FFT (1024): 296.11079189672955
> SOR (100x100):   458.00820213602486
> Monte Carlo : 31.406979573247035
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 364.453338983888

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HARMONY-5599) [drlvm][jit][performance] new movapd instruction for register-to-register copy

Posted by "Mikhail Fursov (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HARMONY-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Fursov resolved HARMONY-5599.
-------------------------------------

    Resolution: Fixed

committed revision r637385

> [drlvm][jit][performance] new movapd instruction for register-to-register copy
> ------------------------------------------------------------------------------
>
>                 Key: HARMONY-5599
>                 URL: https://issues.apache.org/jira/browse/HARMONY-5599
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Vladimir Strigun
>            Assignee: Mikhail Fursov
>         Attachments: H-5599.patch, i586.diff, movapd.diff, movapd2.diff
>
>
> Usage of movapd instruction for copy between xmm registers is more efficient in copmarison with partial movsd copy. So, attached patch replace movsd with movadp instruction for such operations. 
> I've checked the patch on scimark bench [1] and got the following results (about 15% speedup for composite score):
> orig build:
> SciMark 2.0a
> Composite Score: 236.8043350027899
> FFT (1024): 266.4183025101507
> SOR (100x100):   410.3833460433766
> Monte Carlo : 31.43640457526972
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 267.6337069585971
> with movapd:
> SciMark 2.0a
> Composite Score: 271.62584550328904
> FFT (1024): 296.11079189672955
> SOR (100x100):   458.00820213602486
> Monte Carlo : 31.406979573247035
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 364.453338983888

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HARMONY-5599) [drlvm][jit][performance] new movapd instruction for register-to-register copy

Posted by "Mikhail Fursov (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HARMONY-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Fursov updated HARMONY-5599:
------------------------------------

    Attachment: movapd.diff

Vladimir,
I tried to fix all the issues in the patch and slighly refactored SSE2->SSE->X87 lowerer logic

Could I ask to to check that my patch does not affect performance?

> [drlvm][jit][performance] new movapd instruction for register-to-register copy
> ------------------------------------------------------------------------------
>
>                 Key: HARMONY-5599
>                 URL: https://issues.apache.org/jira/browse/HARMONY-5599
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Vladimir Strigun
>            Assignee: Mikhail Fursov
>         Attachments: H-5599.patch, movapd.diff
>
>
> Usage of movapd instruction for copy between xmm registers is more efficient in copmarison with partial movsd copy. So, attached patch replace movsd with movadp instruction for such operations. 
> I've checked the patch on scimark bench [1] and got the following results (about 15% speedup for composite score):
> orig build:
> SciMark 2.0a
> Composite Score: 236.8043350027899
> FFT (1024): 266.4183025101507
> SOR (100x100):   410.3833460433766
> Monte Carlo : 31.43640457526972
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 267.6337069585971
> with movapd:
> SciMark 2.0a
> Composite Score: 271.62584550328904
> FFT (1024): 296.11079189672955
> SOR (100x100):   458.00820213602486
> Monte Carlo : 31.406979573247035
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 364.453338983888

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HARMONY-5599) [drlvm][jit][performance] new movapd instruction for register-to-register copy

Posted by "Mikhail Fursov (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HARMONY-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Fursov reassigned HARMONY-5599:
---------------------------------------

    Assignee: Mikhail Fursov

> [drlvm][jit][performance] new movapd instruction for register-to-register copy
> ------------------------------------------------------------------------------
>
>                 Key: HARMONY-5599
>                 URL: https://issues.apache.org/jira/browse/HARMONY-5599
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Vladimir Strigun
>            Assignee: Mikhail Fursov
>         Attachments: HARMONY-5599.patch
>
>
> Usage of movapd instruction for copy between xmm registers is more efficient in copmarison with partial movsd copy. So, attached patch replace movsd with movadp instruction for such operations. 
> I've checked the patch on scimark bench [1] and got the following results (about 15% speedup for composite score):
> orig build:
> SciMark 2.0a
> Composite Score: 236.8043350027899
> FFT (1024): 266.4183025101507
> SOR (100x100):   410.3833460433766
> Monte Carlo : 31.43640457526972
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 267.6337069585971
> with movapd:
> SciMark 2.0a
> Composite Score: 271.62584550328904
> FFT (1024): 296.11079189672955
> SOR (100x100):   458.00820213602486
> Monte Carlo : 31.406979573247035
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 364.453338983888

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HARMONY-5599) [drlvm][jit][performance] new movapd instruction for register-to-register copy

Posted by "Vladimir Strigun (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HARMONY-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578464#action_12578464 ] 

Vladimir Strigun commented on HARMONY-5599:
-------------------------------------------

Thanks Mikhail for reviewing the patch. AFAIU, only one line should be added - it's fixed in new version. 

> [drlvm][jit][performance] new movapd instruction for register-to-register copy
> ------------------------------------------------------------------------------
>
>                 Key: HARMONY-5599
>                 URL: https://issues.apache.org/jira/browse/HARMONY-5599
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Vladimir Strigun
>            Assignee: Mikhail Fursov
>         Attachments: H-5599.patch
>
>
> Usage of movapd instruction for copy between xmm registers is more efficient in copmarison with partial movsd copy. So, attached patch replace movsd with movadp instruction for such operations. 
> I've checked the patch on scimark bench [1] and got the following results (about 15% speedup for composite score):
> orig build:
> SciMark 2.0a
> Composite Score: 236.8043350027899
> FFT (1024): 266.4183025101507
> SOR (100x100):   410.3833460433766
> Monte Carlo : 31.43640457526972
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 267.6337069585971
> with movapd:
> SciMark 2.0a
> Composite Score: 271.62584550328904
> FFT (1024): 296.11079189672955
> SOR (100x100):   458.00820213602486
> Monte Carlo : 31.406979573247035
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 364.453338983888

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HARMONY-5599) [drlvm][jit][performance] new movapd instruction for register-to-register copy

Posted by "Mikhail Fursov (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HARMONY-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Fursov updated HARMONY-5599:
------------------------------------

    Attachment: movapd2.diff

Looks like the reason of failure that I learned from encoder(that does describe 128 bit types as 64bit types today) the way MOVAPD works and enabled it for memory read/write ops.

Now I fixed the patch and maked it work only for xmm regs.
Please check if performance improvement is the same.

the latest and complete patch so far is: movapd2.diff

> [drlvm][jit][performance] new movapd instruction for register-to-register copy
> ------------------------------------------------------------------------------
>
>                 Key: HARMONY-5599
>                 URL: https://issues.apache.org/jira/browse/HARMONY-5599
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Vladimir Strigun
>            Assignee: Mikhail Fursov
>         Attachments: H-5599.patch, i586.diff, movapd.diff, movapd2.diff
>
>
> Usage of movapd instruction for copy between xmm registers is more efficient in copmarison with partial movsd copy. So, attached patch replace movsd with movadp instruction for such operations. 
> I've checked the patch on scimark bench [1] and got the following results (about 15% speedup for composite score):
> orig build:
> SciMark 2.0a
> Composite Score: 236.8043350027899
> FFT (1024): 266.4183025101507
> SOR (100x100):   410.3833460433766
> Monte Carlo : 31.43640457526972
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 267.6337069585971
> with movapd:
> SciMark 2.0a
> Composite Score: 271.62584550328904
> FFT (1024): 296.11079189672955
> SOR (100x100):   458.00820213602486
> Monte Carlo : 31.406979573247035
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 364.453338983888

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (HARMONY-5599) [drlvm][jit][performance] new movapd instruction for register-to-register copy

Posted by "Vladimir Strigun (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HARMONY-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vladimir Strigun closed HARMONY-5599.
-------------------------------------


> [drlvm][jit][performance] new movapd instruction for register-to-register copy
> ------------------------------------------------------------------------------
>
>                 Key: HARMONY-5599
>                 URL: https://issues.apache.org/jira/browse/HARMONY-5599
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Vladimir Strigun
>            Assignee: Mikhail Fursov
>         Attachments: H-5599.patch, i586.diff, movapd.diff, movapd2.diff
>
>
> Usage of movapd instruction for copy between xmm registers is more efficient in copmarison with partial movsd copy. So, attached patch replace movsd with movadp instruction for such operations. 
> I've checked the patch on scimark bench [1] and got the following results (about 15% speedup for composite score):
> orig build:
> SciMark 2.0a
> Composite Score: 236.8043350027899
> FFT (1024): 266.4183025101507
> SOR (100x100):   410.3833460433766
> Monte Carlo : 31.43640457526972
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 267.6337069585971
> with movapd:
> SciMark 2.0a
> Composite Score: 271.62584550328904
> FFT (1024): 296.11079189672955
> SOR (100x100):   458.00820213602486
> Monte Carlo : 31.406979573247035
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 364.453338983888

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HARMONY-5599) [drlvm][jit][performance] new movapd instruction for register-to-register copy

Posted by "Vladimir Strigun (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HARMONY-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vladimir Strigun updated HARMONY-5599:
--------------------------------------

    Attachment: H-5599.patch

New version of patch

> [drlvm][jit][performance] new movapd instruction for register-to-register copy
> ------------------------------------------------------------------------------
>
>                 Key: HARMONY-5599
>                 URL: https://issues.apache.org/jira/browse/HARMONY-5599
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Vladimir Strigun
>            Assignee: Mikhail Fursov
>         Attachments: H-5599.patch, HARMONY-5599.patch
>
>
> Usage of movapd instruction for copy between xmm registers is more efficient in copmarison with partial movsd copy. So, attached patch replace movsd with movadp instruction for such operations. 
> I've checked the patch on scimark bench [1] and got the following results (about 15% speedup for composite score):
> orig build:
> SciMark 2.0a
> Composite Score: 236.8043350027899
> FFT (1024): 266.4183025101507
> SOR (100x100):   410.3833460433766
> Monte Carlo : 31.43640457526972
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 267.6337069585971
> with movapd:
> SciMark 2.0a
> Composite Score: 271.62584550328904
> FFT (1024): 296.11079189672955
> SOR (100x100):   458.00820213602486
> Monte Carlo : 31.406979573247035
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 364.453338983888

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HARMONY-5599) [drlvm][jit][performance] new movapd instruction for register-to-register copy

Posted by "Vladimir Strigun (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HARMONY-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vladimir Strigun updated HARMONY-5599:
--------------------------------------

    Attachment:     (was: HARMONY-5599.patch)

> [drlvm][jit][performance] new movapd instruction for register-to-register copy
> ------------------------------------------------------------------------------
>
>                 Key: HARMONY-5599
>                 URL: https://issues.apache.org/jira/browse/HARMONY-5599
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Vladimir Strigun
>            Assignee: Mikhail Fursov
>         Attachments: H-5599.patch
>
>
> Usage of movapd instruction for copy between xmm registers is more efficient in copmarison with partial movsd copy. So, attached patch replace movsd with movadp instruction for such operations. 
> I've checked the patch on scimark bench [1] and got the following results (about 15% speedup for composite score):
> orig build:
> SciMark 2.0a
> Composite Score: 236.8043350027899
> FFT (1024): 266.4183025101507
> SOR (100x100):   410.3833460433766
> Monte Carlo : 31.43640457526972
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 267.6337069585971
> with movapd:
> SciMark 2.0a
> Composite Score: 271.62584550328904
> FFT (1024): 296.11079189672955
> SOR (100x100):   458.00820213602486
> Monte Carlo : 31.406979573247035
> Sparse matmult (N=1000, nz=5000): 208.14991492655557
> LU (100x100): 364.453338983888

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.