You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by bu qi cheng <bu...@gmail.com> on 2008/10/12 03:18:26 UTC
[drlvm][jit][opt][performance] Inliner heuristics improvements: hotness and instance initializer bonuses
Hi Aleksey:
For the performance data. We get following data. The version we used
is the version I checked out in Augest, 4. We did not run startup.*.
However, there is still many benchmarks failed to run(Not because of the
patch). From these data we can find 800 is suitable for the
MAX_INLINE_GROWTH_FACTOR_PROF. I am not sure what kind of data can be got in
other benchmarks except SPECjvm2008. Can you give more information on why
MAX_INLINE_GROWTH_FACTOR_PROF = 2000?
CLEAN
MAX_INLINE_GROWTH_FACTOR_PROF =800
MAX_INLINE_GROWTH_FACTOR_PROF = 2000
Clean 800 2000 800 2000
crypto.aes 39.59 38.22 37.08 -3.46% -6.34%
crypto.rsa 193.24 172.08 178.08 -10.95% -7.85%
crypto.signverify 118.6 109.71 107.61 -7.50% -9.27%
compiler.compiler 93.86 91.25 87.61 -2.78% -6.66%
compiler.sunflow 139.63 123.64 136.65 -11.45% -2.13%
scimark.fft.large 14.8 14.93 15.92 0.88% 7.57%
scimark.sor.large 21.69 21.71 21.65 0.09% -0.18%
scimark.sparse.large 12.86 12.88 12.85 0.16% -0.08%
scimark.monte_carlo 298.17 977.29 1024.55 227.76% 243.61%
derby 42.73 40.13 41.8 -6.08% -2.18%
compress 117.23 111.12 110.12 -5.21% -6.07%
xml.validation 81.17 82.15 80.75 1.21% -0.52%
scimark.fft.small 931.98 931.98 903.96 0.00% -3.01%
scimark.lu.small 842.59 831.09 841.83 -1.36% -0.09%
scimark.sparse.small 70.95 65.5 70.7 -7.68% -0.35%
serial 8.32 8 8.23 -3.85% -1.08%
Another problem is that, as you mentioned that escape analysis directive
inline is more suitable for the case. So I am wondering if it's suitable to
commit the whole patch. Maybe, it's better that we only commit the
adjustment of MAX_INLINE_GROWTH_FACTOR_PROF which will introduce about 30%
performance improvement in monte_carlo. However, it will be a hard work if
we re-desgin the optimizations of Harmony. One general consideration is
trying to promote the basic analysises(such as the live range scope of
objects-escape analysis, ..) in the front of the pipeline. What do you think
of it?
Thanks!
Buqi
Re: [drlvm][jit][opt][performance] Inliner heuristics improvements: hotness and instance initializer bonuses
Posted by bu qi cheng <bu...@gmail.com>.
Aleksey:
Thanks for help. No problem. I will update the patch.
Thanks!
Buqi
Re: [drlvm][jit][opt][performance] Inliner heuristics improvements: hotness and instance initializer bonuses
Posted by Aleksey Shipilev <al...@gmail.com>.
Thanks, Bu Qi!
I'm fine with "max_level=2" approach. We may review the
"Random.initialize()" issue more thoroughly later. Would you please
come up with the clean patch?
Thanks again,
Aleksey.
On Wed, Oct 15, 2008 at 2:09 PM, bu qi cheng <bu...@gmail.com> wrote:
> Hi Aleksey:
>
> Sorry for confusion. The data is like following. for
> "nextDouble" Where:
> " clean" = no inline + no sync elimination + no scalar replacement.
> " max_level=2 " = inline + sync elimination by escape analysis in 3
> level(in call graph) method analysis
> "InstanceInitialize inline" = inline + inline "Random.initilizer" + sync
> elimiation and scalar replacement with escape analysis in 1 level(self)
> method analysis.
>
> Improvement
> clean max_level=2
> InstanceInitilize inline max_level=2 InstanceInitilize
> inline
> crypto.aes 39.59 37.79
> 38.22 -0.045466027 -0.034604698
> crypto.rsa 193.24 178.11
> 172.08 -0.078296419 -0.109501138
> crypto.signverify 118.6 111.5
> 109.71 -0.059865093 -0.074957841
> compiler.compiler 93.86 95.2
> 91.25 0.014276582 -0.027807373
> compiler.sunflow 139.63 133.45
> 123.64 -0.04425983 -0.114516938
> scimark.fft.large 14.8 15.01
> 14.93 0.014189189 0.008783784
> scimark.sor.large 21.69 21.67
> 21.71 -0.000922084 0.000922084
> scimark.sparse.large 12.86 12.77
> 12.88 -0.006998445 0.00155521
> scimark.monte_carlo 298.17 707.2
> 977.29 1.371801321 2.277626857
> xml.validation 81.17 79.1
> 82.15 -0.025502033 0.012073426
> scimark.fft.small 931.98 919.09
> 931.98 -0.013830769 0
> scimark.lu.small 842.59 811.66
> 831.09 -0.036708245 -0.013648394
> scimark.sparse.small 70.95 70.94
> 65.5 -0.000140944 -0.076814658
> serial 8.32 8
> 8 -0.038461538 -0.038461538
>
> Thanks!
>
> Buqi
>
Re: [drlvm][jit][opt][performance] Inliner heuristics improvements: hotness and instance initializer bonuses
Posted by bu qi cheng <bu...@gmail.com>.
Hi Aleksey:
Sorry for confusion. The data is like following. for
"nextDouble" Where:
" clean" = no inline + no sync elimination + no scalar replacement.
" max_level=2 " = inline + sync elimination by escape analysis in 3
level(in call graph) method analysis
"InstanceInitialize inline" = inline + inline "Random.initilizer" + sync
elimiation and scalar replacement with escape analysis in 1 level(self)
method analysis.
Improvement
clean max_level=2
InstanceInitilize inline max_level=2 InstanceInitilize
inline
crypto.aes 39.59 37.79
38.22 -0.045466027 -0.034604698
crypto.rsa 193.24 178.11
172.08 -0.078296419 -0.109501138
crypto.signverify 118.6 111.5
109.71 -0.059865093 -0.074957841
compiler.compiler 93.86 95.2
91.25 0.014276582 -0.027807373
compiler.sunflow 139.63 133.45
123.64 -0.04425983 -0.114516938
scimark.fft.large 14.8 15.01
14.93 0.014189189 0.008783784
scimark.sor.large 21.69 21.67
21.71 -0.000922084 0.000922084
scimark.sparse.large 12.86 12.77
12.88 -0.006998445 0.00155521
scimark.monte_carlo 298.17 707.2
977.29 1.371801321 2.277626857
xml.validation 81.17 79.1
82.15 -0.025502033 0.012073426
scimark.fft.small 931.98 919.09
931.98 -0.013830769 0
scimark.lu.small 842.59 811.66
831.09 -0.036708245 -0.013648394
scimark.sparse.small 70.95 70.94
65.5 -0.000140944 -0.076814658
serial 8.32 8
8 -0.038461538 -0.038461538
Thanks!
Buqi
Re: [drlvm][jit][opt][performance] Inliner heuristics improvements: hotness and instance initializer bonuses
Posted by Aleksey Shipilev <al...@gmail.com>.
Hi, Bu Qi!
It's great to have escape analyzer make more through optimizations.
But the data seem to be crapped by mailer :) Can you decipher this
line? What do the numbers there mean?
scimark.monte_carlo 298.17 707.2 977.29 1.371801321 2.277627
And what's the Cycler?
Thanks,
Aleksey.
On Tue, Oct 14, 2008 at 7:11 AM, bu qi cheng <bu...@gmail.com> wrote:
> Hi Aleksey:
>
> This is the data which will not don't count on InstanceInitilization bonus.
> However, we fixed the escape analysis and extend the analysis method level:
> max_level=2. With this fix, the sync elimiation is done also. However, the
> scalar replacement still don't work. The data is like following:
>
> clean max_level=2 InstanceInitilize inline max_level=2 InstanceInitilize inline
> From the data we can find the benefit distribution is: inline: 100, sync
> elimiation: 300, scalar replacement:300.
>
> So, I think we add patch for inliner and escape analysis at same time is
> better. For scalar replacement, we are working on another project(Cycler) on
> escape analysis, I think it can solve the problem.
>
> Thanks!
>
> Buqi
>
Re: [drlvm][jit][opt][performance] Inliner heuristics improvements: hotness and instance initializer bonuses
Posted by bu qi cheng <bu...@gmail.com>.
Hi Aleksey:
This is the data which will not don't count on InstanceInitilization bonus.
However, we fixed the escape analysis and extend the analysis method level:
max_level=2. With this fix, the sync elimiation is done also. However, the
scalar replacement still don't work. The data is like following:
clean max_level=2 InstanceInitilize inline max_level=2 InstanceInitilize
inline crypto.aes 39.59 37.79 38.22 -0.045466027 -0.0346 crypto.rsa 193.24
178.11 172.08 -0.078296419 -0.1095 crypto.signverify 118.6 111.5 109.71
-0.059865093 -0.07496 compiler.compiler 93.86 95.2 91.25 0.014276582
-0.02781 compiler.sunflow 139.63 133.45 123.64 -0.04425983 -0.11452
scimark.fft.large 14.8 15.01 14.93 0.014189189 0.008784 scimark.sor.large
21.69 21.67 21.71 -0.000922084 0.000922 scimark.sparse.large 12.86 12.77
12.88 -0.006998445 0.001555 scimark.monte_carlo 298.17 707.2 977.29
1.371801321 2.277627 xml.validation 81.17 79.1 82.15 -0.025502033 0.012073
scimark.fft.small 931.98 919.09 931.98 -0.013830769 0 scimark.lu.small
842.59 811.66 831.09 -0.036708245 -0.01365 scimark.sparse.small 70.95 70.94
65.5 -0.000140944 -0.07681 serial 8.32 8 8 -0.038461538 -0.03846
>From the data we can find the benefit distribution is: inline: 100, sync
elimiation: 300, scalar replacement:300.
So, I think we add patch for inliner and escape analysis at same time is
better. For scalar replacement, we are working on another project(Cycler) on
escape analysis, I think it can solve the problem.
Thanks!
Buqi
Re: [drlvm][jit][opt][performance] Inliner heuristics improvements: hotness and instance initializer bonuses
Posted by bu qi cheng <bu...@gmail.com>.
Hi Aleksey:
I agree with you that hotness bonus fix is a good fix.
After more consideration, I think, for instance initializer, there is
no need to inline it. Alse there is no need for the escape analysis directed
inline as what I mention in last email. Since most of performance is come
from the synchornization elimination and inlining of nextDouble, if escape
analysis can find that the object is none-escape, and eliminate the
synchorinization will be all right. I will check if escape analysis and do
analysis and optimization for multi level methods.
No problem, I will run SPECjvm2008 again. I have ever run it(But did not
record the data). No explicit penalty found.
Yes, inline will be tuned again and the logic will be double checked. I
will report the new data to you after I get any result.
Thanks!
Buqi
>
Re: [drlvm][jit][opt][performance] Inliner heuristics improvements: hotness and instance initializer bonuses
Posted by Aleksey Shipilev <al...@gmail.com>.
Hi, Buqi!
Let's revisit this patch. It was a half an year ago, so I can easily
miss something. So, the patch consists of two parts:
1. hotness bonus fix. This is a bug in the inliner heuristic: the
bonus is purely multiplicative, which means if I have some very small,
but negative inline benefit for really hot method, then after applying
the hottness bonus, it will scale down to very big negative inline
benefit. That would effectively prevent this method from inlining,
which is obviously not the right thing.
2. instance initializer fix. This one is the workaround of escape
analysis inefficiencies. Of course, the clean way to deal with issues
will surely be propagating escape analysis further. But I doubt that
such kind of patch is bearable for us to have in short time, any
Jitrino guru is avalable here?
And max inline constant is just changed to fit the inline tree with
all required methods. If your research shows that 800 is enough, then
you may use it, I have no objections against that. Could you please
try to extract hotness bonus fix, increase max inline constant to 800
and then run the SPECjvm2008 again?
Personally, I think that entire inliner logic must be revisited, as
the heuristic used was observed to have bad performance results for
this particular case, other benchmarks of SPECjvm2008 (at least serial
[1]), and some of Stefan Krause's benchmarks.
Thanks,
Aleksey.
[1] https://issues.apache.org/jira/browse/HARMONY-5719
On Sun, Oct 12, 2008 at 5:18 AM, bu qi cheng <bu...@gmail.com> wrote:
> Hi Aleksey:
>
> For the performance data. We get following data. The version we used
> is the version I checked out in Augest, 4. We did not run startup.*.
> However, there is still many benchmarks failed to run(Not because of the
> patch). From these data we can find 800 is suitable for the
> MAX_INLINE_GROWTH_FACTOR_PROF. I am not sure what kind of data can be got in
> other benchmarks except SPECjvm2008. Can you give more information on why
> MAX_INLINE_GROWTH_FACTOR_PROF = 2000?
>
> CLEAN
> MAX_INLINE_GROWTH_FACTOR_PROF =800
> MAX_INLINE_GROWTH_FACTOR_PROF = 2000
>
>
> Clean 800 2000 800 2000
> crypto.aes 39.59 38.22 37.08 -3.46% -6.34%
> crypto.rsa 193.24 172.08 178.08 -10.95% -7.85%
> crypto.signverify 118.6 109.71 107.61 -7.50% -9.27%
> compiler.compiler 93.86 91.25 87.61 -2.78% -6.66%
> compiler.sunflow 139.63 123.64 136.65 -11.45% -2.13%
> scimark.fft.large 14.8 14.93 15.92 0.88% 7.57%
> scimark.sor.large 21.69 21.71 21.65 0.09% -0.18%
> scimark.sparse.large 12.86 12.88 12.85 0.16% -0.08%
> scimark.monte_carlo 298.17 977.29 1024.55 227.76% 243.61%
> derby 42.73 40.13 41.8 -6.08% -2.18%
> compress 117.23 111.12 110.12 -5.21% -6.07%
> xml.validation 81.17 82.15 80.75 1.21% -0.52%
> scimark.fft.small 931.98 931.98 903.96 0.00% -3.01%
> scimark.lu.small 842.59 831.09 841.83 -1.36% -0.09%
> scimark.sparse.small 70.95 65.5 70.7 -7.68% -0.35%
> serial 8.32 8 8.23 -3.85% -1.08%
>
> Another problem is that, as you mentioned that escape analysis directive
> inline is more suitable for the case. So I am wondering if it's suitable to
> commit the whole patch. Maybe, it's better that we only commit the
> adjustment of MAX_INLINE_GROWTH_FACTOR_PROF which will introduce about 30%
> performance improvement in monte_carlo. However, it will be a hard work if
> we re-desgin the optimizations of Harmony. One general consideration is
> trying to promote the basic analysises(such as the live range scope of
> objects-escape analysis, ..) in the front of the pipeline. What do you think
> of it?
>
> Thanks!
>
> Buqi
>