You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by bu qi cheng <bu...@gmail.com> on 2008/10/12 03:18:26 UTC

[drlvm][jit][opt][performance] Inliner heuristics improvements: hotness and instance initializer bonuses

Hi Aleksey:

       For the performance data. We get following data. The version we used
is the version I checked out in Augest, 4. We did not run startup.*.
However, there is still many benchmarks failed to run(Not because of the
patch). From these data we can find 800 is suitable for the
MAX_INLINE_GROWTH_FACTOR_PROF. I am not sure what kind of data can be got in
other benchmarks except SPECjvm2008. Can you give more information on why
MAX_INLINE_GROWTH_FACTOR_PROF = 2000?

CLEAN
MAX_INLINE_GROWTH_FACTOR_PROF =800
MAX_INLINE_GROWTH_FACTOR_PROF = 2000


Clean 800 2000 800 2000
crypto.aes 39.59 38.22 37.08 -3.46% -6.34%
crypto.rsa 193.24 172.08 178.08 -10.95% -7.85%
crypto.signverify 118.6 109.71 107.61 -7.50% -9.27%
compiler.compiler 93.86 91.25 87.61 -2.78% -6.66%
compiler.sunflow 139.63 123.64 136.65 -11.45% -2.13%
scimark.fft.large 14.8 14.93 15.92 0.88% 7.57%
scimark.sor.large 21.69 21.71 21.65 0.09% -0.18%
scimark.sparse.large 12.86 12.88 12.85 0.16% -0.08%
scimark.monte_carlo 298.17 977.29 1024.55 227.76% 243.61%
derby 42.73 40.13 41.8 -6.08% -2.18%
compress 117.23 111.12 110.12 -5.21% -6.07%
xml.validation 81.17 82.15 80.75 1.21% -0.52%
scimark.fft.small 931.98 931.98 903.96 0.00% -3.01%
scimark.lu.small 842.59 831.09 841.83 -1.36% -0.09%
scimark.sparse.small 70.95 65.5 70.7 -7.68% -0.35%
serial 8.32 8 8.23 -3.85% -1.08%

Another problem is that, as you mentioned that escape analysis directive
inline is more suitable for the case. So I am wondering if it's suitable to
commit the whole patch. Maybe, it's better that we only commit the
adjustment of MAX_INLINE_GROWTH_FACTOR_PROF which will introduce about 30%
performance improvement in monte_carlo. However, it will be a hard work if
we re-desgin the optimizations of Harmony. One general consideration is
trying to promote the basic analysises(such as the live range scope of
objects-escape analysis, ..) in the front of the pipeline. What do you think
of it?

Thanks!

Buqi

Re: [drlvm][jit][opt][performance] Inliner heuristics improvements: hotness and instance initializer bonuses

Posted by bu qi cheng <bu...@gmail.com>.
Aleksey:

    Thanks for help. No problem. I will update the patch.

Thanks!

Buqi

Re: [drlvm][jit][opt][performance] Inliner heuristics improvements: hotness and instance initializer bonuses

Posted by Aleksey Shipilev <al...@gmail.com>.
Thanks, Bu Qi!

I'm fine with "max_level=2" approach. We may review the
"Random.initialize()" issue more thoroughly later. Would you please
come up with the clean patch?

Thanks again,
Aleksey.

On Wed, Oct 15, 2008 at 2:09 PM, bu qi cheng <bu...@gmail.com> wrote:
> Hi Aleksey:
>
>     Sorry for confusion. The data is like following. for
> "nextDouble" Where:
> " clean" = no inline  + no sync elimination  + no scalar replacement.
> "  max_level=2  " = inline  + sync elimination by escape analysis in 3
> level(in call graph) method analysis
> "InstanceInitialize inline" = inline  + inline "Random.initilizer" + sync
> elimiation and scalar replacement with escape analysis in 1 level(self)
> method  analysis.
>
> Improvement
>                                    clean     max_level=2
> InstanceInitilize inline           max_level=2          InstanceInitilize
> inline
> crypto.aes                       39.59      37.79
> 38.22                              -0.045466027          -0.034604698
> crypto.rsa                       193.24    178.11
> 172.08                             -0.078296419          -0.109501138
> crypto.signverify               118.6    111.5
> 109.71                             -0.059865093           -0.074957841
> compiler.compiler             93.86    95.2
>  91.25                              0.014276582            -0.027807373
> compiler.sunflow              139.63  133.45
> 123.64                              -0.04425983           -0.114516938
> scimark.fft.large                14.8     15.01
> 14.93                               0.014189189           0.008783784
> scimark.sor.large              21.69    21.67
> 21.71                             -0.000922084         0.000922084
> scimark.sparse.large         12.86   12.77
>  12.88                             -0.006998445         0.00155521
> scimark.monte_carlo         298.17  707.2
>  977.29                             1.371801321              2.277626857
> xml.validation                    81.17   79.1
> 82.15                              -0.025502033             0.012073426
> scimark.fft.small               931.98  919.09
>  931.98                              -0.013830769              0
> scimark.lu.small               842.59  811.66
>  831.09                              -0.036708245         -0.013648394
> scimark.sparse.small      70.95      70.94
> 65.5                                 -0.000140944          -0.076814658
> serial                              8.32        8
> 8                                -0.038461538          -0.038461538
>
> Thanks!
>
> Buqi
>

Re: [drlvm][jit][opt][performance] Inliner heuristics improvements: hotness and instance initializer bonuses

Posted by bu qi cheng <bu...@gmail.com>.
Hi Aleksey:

     Sorry for confusion. The data is like following. for
"nextDouble" Where:
" clean" = no inline  + no sync elimination  + no scalar replacement.
"  max_level=2  " = inline  + sync elimination by escape analysis in 3
level(in call graph) method analysis
"InstanceInitialize inline" = inline  + inline "Random.initilizer" + sync
elimiation and scalar replacement with escape analysis in 1 level(self)
method  analysis.

Improvement
                                    clean     max_level=2
InstanceInitilize inline           max_level=2          InstanceInitilize
inline
crypto.aes                       39.59      37.79
38.22                              -0.045466027          -0.034604698
crypto.rsa                       193.24    178.11
172.08                             -0.078296419          -0.109501138
crypto.signverify               118.6    111.5
109.71                             -0.059865093           -0.074957841
compiler.compiler             93.86    95.2
 91.25                              0.014276582            -0.027807373
compiler.sunflow              139.63  133.45
123.64                              -0.04425983           -0.114516938
scimark.fft.large                14.8     15.01
14.93                               0.014189189           0.008783784
scimark.sor.large              21.69    21.67
21.71                             -0.000922084         0.000922084
scimark.sparse.large         12.86   12.77
 12.88                             -0.006998445         0.00155521
scimark.monte_carlo         298.17  707.2
 977.29                             1.371801321              2.277626857
xml.validation                    81.17   79.1
82.15                              -0.025502033             0.012073426
scimark.fft.small               931.98  919.09
 931.98                              -0.013830769              0
scimark.lu.small               842.59  811.66
  831.09                              -0.036708245         -0.013648394
scimark.sparse.small      70.95      70.94
65.5                                 -0.000140944          -0.076814658
serial                              8.32        8
8                                -0.038461538          -0.038461538

Thanks!

Buqi

Re: [drlvm][jit][opt][performance] Inliner heuristics improvements: hotness and instance initializer bonuses

Posted by Aleksey Shipilev <al...@gmail.com>.
Hi, Bu Qi!

It's great to have escape analyzer make more through optimizations.
But the data seem to be crapped by mailer :) Can you decipher this
line? What do the numbers there mean?

scimark.monte_carlo 298.17 707.2 977.29 1.371801321 2.277627

And what's the Cycler?

Thanks,
Aleksey.

On Tue, Oct 14, 2008 at 7:11 AM, bu qi cheng <bu...@gmail.com> wrote:
> Hi Aleksey:
>
>  This is the data which will not don't count on InstanceInitilization bonus.
> However, we fixed the escape analysis and extend the analysis method level:
> max_level=2. With this fix, the sync elimiation is done also. However, the
> scalar replacement still don't work. The data is like following:
>
>      clean max_level=2 InstanceInitilize inline max_level=2 InstanceInitilize inline


> From the data we can find the benefit distribution is: inline: 100,  sync
> elimiation: 300, scalar replacement:300.
>
> So, I think we add patch for inliner and escape analysis at same time is
> better. For scalar replacement, we are working on another project(Cycler) on
> escape analysis, I think it can solve the problem.
>
> Thanks!
>
> Buqi
>

Re: [drlvm][jit][opt][performance] Inliner heuristics improvements: hotness and instance initializer bonuses

Posted by bu qi cheng <bu...@gmail.com>.
Hi Aleksey:

 This is the data which will not don't count on InstanceInitilization bonus.
However, we fixed the escape analysis and extend the analysis method level:
max_level=2. With this fix, the sync elimiation is done also. However, the
scalar replacement still don't work. The data is like following:

      clean max_level=2 InstanceInitilize inline max_level=2 InstanceInitilize
inline crypto.aes 39.59 37.79 38.22 -0.045466027 -0.0346 crypto.rsa 193.24
178.11 172.08 -0.078296419 -0.1095 crypto.signverify 118.6 111.5 109.71
-0.059865093 -0.07496 compiler.compiler 93.86 95.2 91.25 0.014276582
-0.02781 compiler.sunflow 139.63 133.45 123.64 -0.04425983 -0.11452
scimark.fft.large 14.8 15.01 14.93 0.014189189 0.008784 scimark.sor.large
21.69 21.67 21.71 -0.000922084 0.000922 scimark.sparse.large 12.86 12.77
12.88 -0.006998445 0.001555 scimark.monte_carlo 298.17 707.2 977.29
1.371801321 2.277627 xml.validation 81.17 79.1 82.15 -0.025502033 0.012073
scimark.fft.small 931.98 919.09 931.98 -0.013830769 0 scimark.lu.small
842.59 811.66 831.09 -0.036708245 -0.01365 scimark.sparse.small 70.95 70.94
65.5 -0.000140944 -0.07681 serial 8.32 8 8 -0.038461538 -0.03846

>From the data we can find the benefit distribution is: inline: 100,  sync
elimiation: 300, scalar replacement:300.

So, I think we add patch for inliner and escape analysis at same time is
better. For scalar replacement, we are working on another project(Cycler) on
escape analysis, I think it can solve the problem.

Thanks!

Buqi

Re: [drlvm][jit][opt][performance] Inliner heuristics improvements: hotness and instance initializer bonuses

Posted by bu qi cheng <bu...@gmail.com>.
Hi Aleksey:

    I agree with you that hotness bonus fix is a good fix.

    After more consideration, I think,  for instance initializer, there is
no need to inline it. Alse there is no need for the escape analysis directed
inline as what I mention in last email. Since most of performance is come
from the synchornization elimination and inlining of nextDouble, if escape
analysis can find that the object is none-escape, and eliminate the
synchorinization will be all right. I will check if escape analysis and do
analysis and optimization for multi level methods.

   No problem, I will run SPECjvm2008 again. I have ever run it(But did not
record the data). No explicit penalty found.
   Yes, inline will be tuned again and the logic will be double checked. I
will report the new data to you after I get any result.

Thanks!

Buqi

>

Re: [drlvm][jit][opt][performance] Inliner heuristics improvements: hotness and instance initializer bonuses

Posted by Aleksey Shipilev <al...@gmail.com>.
Hi, Buqi!

Let's revisit this patch. It was a half an year ago, so I can easily
miss something. So, the patch consists of two parts:

 1. hotness bonus fix. This is a bug in the inliner heuristic: the
bonus is purely multiplicative, which means if I have some very small,
but negative inline benefit for really hot method, then after applying
the hottness bonus, it will scale down to very big negative inline
benefit. That would effectively prevent this method from inlining,
which is obviously not the right thing.

 2. instance initializer fix. This one is the workaround of escape
analysis inefficiencies. Of course, the clean way to deal with issues
will surely be propagating escape analysis further. But I doubt that
such kind of patch is bearable for us to have in short time, any
Jitrino guru is avalable here?

And max inline constant is just changed to fit the inline tree with
all required methods. If your research shows that 800 is enough, then
you may use it, I have no objections against that. Could you please
try to extract hotness bonus fix, increase max inline constant to 800
and then run the SPECjvm2008 again?

Personally, I think that entire inliner logic must be revisited, as
the heuristic used was observed to have bad performance results for
this particular case, other benchmarks of SPECjvm2008 (at least serial
[1]), and some of Stefan Krause's benchmarks.

Thanks,
Aleksey.

[1] https://issues.apache.org/jira/browse/HARMONY-5719

On Sun, Oct 12, 2008 at 5:18 AM, bu qi cheng <bu...@gmail.com> wrote:
> Hi Aleksey:
>
>       For the performance data. We get following data. The version we used
> is the version I checked out in Augest, 4. We did not run startup.*.
> However, there is still many benchmarks failed to run(Not because of the
> patch). From these data we can find 800 is suitable for the
> MAX_INLINE_GROWTH_FACTOR_PROF. I am not sure what kind of data can be got in
> other benchmarks except SPECjvm2008. Can you give more information on why
> MAX_INLINE_GROWTH_FACTOR_PROF = 2000?
>
> CLEAN
> MAX_INLINE_GROWTH_FACTOR_PROF =800
> MAX_INLINE_GROWTH_FACTOR_PROF = 2000
>
>
> Clean 800 2000 800 2000
> crypto.aes 39.59 38.22 37.08 -3.46% -6.34%
> crypto.rsa 193.24 172.08 178.08 -10.95% -7.85%
> crypto.signverify 118.6 109.71 107.61 -7.50% -9.27%
> compiler.compiler 93.86 91.25 87.61 -2.78% -6.66%
> compiler.sunflow 139.63 123.64 136.65 -11.45% -2.13%
> scimark.fft.large 14.8 14.93 15.92 0.88% 7.57%
> scimark.sor.large 21.69 21.71 21.65 0.09% -0.18%
> scimark.sparse.large 12.86 12.88 12.85 0.16% -0.08%
> scimark.monte_carlo 298.17 977.29 1024.55 227.76% 243.61%
> derby 42.73 40.13 41.8 -6.08% -2.18%
> compress 117.23 111.12 110.12 -5.21% -6.07%
> xml.validation 81.17 82.15 80.75 1.21% -0.52%
> scimark.fft.small 931.98 931.98 903.96 0.00% -3.01%
> scimark.lu.small 842.59 831.09 841.83 -1.36% -0.09%
> scimark.sparse.small 70.95 65.5 70.7 -7.68% -0.35%
> serial 8.32 8 8.23 -3.85% -1.08%
>
> Another problem is that, as you mentioned that escape analysis directive
> inline is more suitable for the case. So I am wondering if it's suitable to
> commit the whole patch. Maybe, it's better that we only commit the
> adjustment of MAX_INLINE_GROWTH_FACTOR_PROF which will introduce about 30%
> performance improvement in monte_carlo. However, it will be a hard work if
> we re-desgin the optimizations of Harmony. One general consideration is
> trying to promote the basic analysises(such as the live range scope of
> objects-escape analysis, ..) in the front of the pipeline. What do you think
> of it?
>
> Thanks!
>
> Buqi
>