You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@harmony.apache.org by "Alexey Varlamov (JIRA)" <ji...@apache.org> on 2008/03/05 09:21:41 UTC
[jira] Commented: (HARMONY-4620) [drlvm][jit] Long return path for
floating point values in calling convention
[ https://issues.apache.org/jira/browse/HARMONY-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12575265#action_12575265 ]
Alexey Varlamov commented on HARMONY-4620:
------------------------------------------
Evgueni, AFAICS you've also fixed HARMONY-5152 in the suggested patch, thanks!
However I do have compatibility concerns, do we really want to cut off completely non-SSE CPUs support?
The best solution would be defining one more SSE-based managed CC and providing runtime customization to switch between them, accordingly to host architecture. However there is no such customization machinery available in the VM yet, so as a bare minimum we should keep compile-time switch, e.g. ifdef SSE/FPU return path.
Could you please add such ifdefs to the patch?
> [drlvm][jit] Long return path for floating point values in calling convention
> -----------------------------------------------------------------------------
>
> Key: HARMONY-4620
> URL: https://issues.apache.org/jira/browse/HARMONY-4620
> Project: Harmony
> Issue Type: Improvement
> Components: DRLVM
> Environment: appropriate for for Intel architecture
> Reporter: Naumova Natalya
> Assignee: Mikhail Fursov
> Attachments: return_xmm.patch, return_xmm_2.patch, return_xmm_3.patch
>
>
> DRLVM has too long return path when the return value is floatin point. The reason is FPU usage together with SSE instructions in calling convention: we have "SSE -> mem -> FPU -> (return) mem -> SSE"; return (double) value first is calculated on xmm* registers, then copied to mem, then is put on FPU stack, then extracted from this stack (in calling proc) to memory again, then again calculation is happened in xmm* registers (SSE instructions). This issue overrides the improvement with loop unrolling, overhead from the parameters passing with this calling convention overrides the loop body doubling speed-up. When you increase "arg.optimizer.unroll.medium_loop_unroll_count" option in method where return value is double and it is in loop, then you'll have degradation (example - MonteCarlo benchmark in SciMark).
> Can we avoid using FPU with SSE in this case?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.