You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Todd Lipcon <to...@apache.org> on 2017/06/07 00:15:35 UTC

Fwd: [kudu-CR] codegen: fix regression in inlining after LLVM 4.0 upgrade

FYI Impala folks: you might hit this issue when you upgrade to llvm 4.0 as
well. Something to be aware of
---------- Forwarded message ----------
From: "Todd Lipcon (Code Review)" <ge...@cloudera.org>
Date: Jun 6, 2017 5:06 PM
Subject: [kudu-CR] codegen: fix regression in inlining after LLVM 4.0
upgrade
To: "Dan Burkert" <da...@apache.org>, "David Ribeiro Alves" <
davidralves@gmail.com>, "Binglin Chang" <de...@gmail.com>, <
reviews@kudu.incubator.apache.org>
Cc: "Todd Lipcon" <to...@apache.org>

Hello Dan Burkert,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/7099

to review the following change.

Change subject: codegen: fix regression in inlining after LLVM 4.0 upgrade
......................................................................

codegen: fix regression in inlining after LLVM 4.0 upgrade

With the upgrade to LLVM 4.0, the performance of our generated code
regressed significantly. After looking at the generated assembly from
codegen-test, I could see that there were many 'call' instructions that
didn't appear in the earlier output, which led me to suspect the
inliner.

After adding a 'module->dump()' call and looking at the output, I could
see that the calls were to utility functions like BitmapTest() which
should be inlined due to be very small. However, the function was marked
'noinline' in precompiled.ll. After a bit of Googling I came across a
thread[1] in which someone else noticed that all of their functions had
unexpected 'noinline' attributes after upgrading to 4.0.

After following the breadcrumbs, I found some advice to change the way
in which we emit precompiled.ll to disable the built-in LLVM passes
which were responsible for adding this attribute.

During my investigation, I also found that we should have been passing
the optimization and size levels into the inlining pass, so I threw
that change in for good measure.

This fixed the perf regression. I benchmarked using:
  memrowset-test --roundtrip_num_rows=10000000 \
                 --gtest_filter=\*InsertCount\* \
                 --gtest_repeat=10

Before this fix:

  I0606 16:55:51.448117 48991 memrowset-test.cc:446] Time spent Scanning
rows where all are committed: real 0.502s        user 0.502s     sys 0.000s
  I0606 16:55:57.937391 48991 memrowset-test.cc:446] Time spent Scanning
rows where all are committed: real 0.500s        user 0.499s     sys 0.000s
  I0606 16:56:04.330037 48991 memrowset-test.cc:446] Time spent Scanning
rows where all are committed: real 0.498s        user 0.499s     sys 0.000s
  I0606 16:56:10.706786 48991 memrowset-test.cc:446] Time spent Scanning
rows where all are committed: real 0.504s        user 0.503s     sys 0.001s
  I0606 16:56:17.094764 48991 memrowset-test.cc:446] Time spent Scanning
rows where all are committed: real 0.510s        user 0.510s     sys 0.000s
  I0606 16:56:23.474200 48991 memrowset-test.cc:446] Time spent Scanning
rows where all are committed: real 0.502s        user 0.502s     sys 0.000s
  I0606 16:56:29.845799 48991 memrowset-test.cc:446] Time spent Scanning
rows where all are committed: real 0.503s        user 0.503s     sys 0.000s
  I0606 16:56:36.221843 48991 memrowset-test.cc:446] Time spent Scanning
rows where all are committed: real 0.502s        user 0.503s     sys 0.000s
  I0606 16:56:42.599604 48991 memrowset-test.cc:446] Time spent Scanning
rows where all are committed: real 0.502s        user 0.503s     sys 0.000s
  I0606 16:56:48.962482 48991 memrowset-test.cc:446] Time spent Scanning
rows where all are committed: real 0.503s        user 0.502s     sys 0.000s

After this fix:

  I0606 16:53:51.573541 48403 memrowset-test.cc:446] Time spent Scanning
rows where all are committed: real 0.302s   user 0.302s     sys 0.000s
  I0606 16:53:57.922485 48403 memrowset-test.cc:446] Time spent Scanning
rows where all are committed: real 0.301s   user 0.301s     sys 0.000s
  I0606 16:54:04.186511 48403 memrowset-test.cc:446] Time spent Scanning
rows where all are committed: real 0.310s   user 0.310s     sys 0.000s
  I0606 16:54:10.425207 48403 memrowset-test.cc:446] Time spent Scanning
rows where all are committed: real 0.313s   user 0.313s     sys 0.000s
  I0606 16:54:16.648202 48403 memrowset-test.cc:446] Time spent Scanning
rows where all are committed: real 0.316s   user 0.316s     sys 0.000s
  I0606 16:54:22.868655 48403 memrowset-test.cc:446] Time spent Scanning
rows where all are committed: real 0.311s   user 0.312s     sys 0.000s
  I0606 16:54:29.081784 48403 memrowset-test.cc:446] Time spent Scanning
rows where all are committed: real 0.313s   user 0.314s     sys 0.000s
  I0606 16:54:35.317623 48403 memrowset-test.cc:446] Time spent Scanning
rows where all are committed: real 0.314s   user 0.312s     sys 0.003s
  I0606 16:54:41.550542 48403 memrowset-test.cc:446] Time spent Scanning
rows where all are committed: real 0.316s   user 0.315s     sys 0.000s
  I0606 16:54:47.761406 48403 memrowset-test.cc:446] Time spent Scanning
rows where all are committed: real 0.314s   user 0.314s     sys 0.000s

I also manually inspected the assembly from codegen-test and compared vs
a version compiled with LLVM 3.9 and found that it now has fewer
instructions than before.

[1] http://lists.llvm.org/pipermail/llvm-dev/2017-May/113175.html

Change-Id: I7e449df80e5cd02b9ce2dbf4daa5cf8151007dcf
---
M src/kudu/codegen/CMakeLists.txt
M src/kudu/codegen/module_builder.cc
2 files changed, 11 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/7099/1
--
To view, visit http://gerrit.cloudera.org:8080/7099
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I7e449df80e5cd02b9ce2dbf4daa5cf8151007dcf
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>

Re: [kudu-CR] codegen: fix regression in inlining after LLVM 4.0 upgrade

Posted by Tim Wood <tw...@cloudera.com>.
Super interesting!  Nice sleuthing!
TW
Impala QE

On Tue, Jun 6, 2017 at 5:15 PM, Todd Lipcon <to...@apache.org> wrote:

> FYI Impala folks: you might hit this issue when you upgrade to llvm 4.0 as
> well. Something to be aware of
> ---------- Forwarded message ----------
> From: "Todd Lipcon (Code Review)" <ge...@cloudera.org>
> Date: Jun 6, 2017 5:06 PM
> Subject: [kudu-CR] codegen: fix regression in inlining after LLVM 4.0
> upgrade
> To: "Dan Burkert" <da...@apache.org>, "David Ribeiro Alves" <
> davidralves@gmail.com>, "Binglin Chang" <de...@gmail.com>, <
> reviews@kudu.incubator.apache.org>
> Cc: "Todd Lipcon" <to...@apache.org>
>
> Hello Dan Burkert,
>
> I'd like you to do a code review.  Please visit
>
>     http://gerrit.cloudera.org:8080/7099
>
> to review the following change.
>
> Change subject: codegen: fix regression in inlining after LLVM 4.0 upgrade
> ......................................................................
>
> codegen: fix regression in inlining after LLVM 4.0 upgrade
>
> With the upgrade to LLVM 4.0, the performance of our generated code
> regressed significantly. After looking at the generated assembly from
> codegen-test, I could see that there were many 'call' instructions that
> didn't appear in the earlier output, which led me to suspect the
> inliner.
>
> After adding a 'module->dump()' call and looking at the output, I could
> see that the calls were to utility functions like BitmapTest() which
> should be inlined due to be very small. However, the function was marked
> 'noinline' in precompiled.ll. After a bit of Googling I came across a
> thread[1] in which someone else noticed that all of their functions had
> unexpected 'noinline' attributes after upgrading to 4.0.
>
> After following the breadcrumbs, I found some advice to change the way
> in which we emit precompiled.ll to disable the built-in LLVM passes
> which were responsible for adding this attribute.
>
> During my investigation, I also found that we should have been passing
> the optimization and size levels into the inlining pass, so I threw
> that change in for good measure.
>
> This fixed the perf regression. I benchmarked using:
>   memrowset-test --roundtrip_num_rows=10000000 \
>                  --gtest_filter=\*InsertCount\* \
>                  --gtest_repeat=10
>
> Before this fix:
>
>   I0606 16:55:51.448117 48991 memrowset-test.cc:446] Time spent Scanning
> rows where all are committed: real 0.502s        user 0.502s     sys 0.000s
>   I0606 16:55:57.937391 48991 memrowset-test.cc:446] Time spent Scanning
> rows where all are committed: real 0.500s        user 0.499s     sys 0.000s
>   I0606 16:56:04.330037 48991 memrowset-test.cc:446] Time spent Scanning
> rows where all are committed: real 0.498s        user 0.499s     sys 0.000s
>   I0606 16:56:10.706786 48991 memrowset-test.cc:446] Time spent Scanning
> rows where all are committed: real 0.504s        user 0.503s     sys 0.001s
>   I0606 16:56:17.094764 48991 memrowset-test.cc:446] Time spent Scanning
> rows where all are committed: real 0.510s        user 0.510s     sys 0.000s
>   I0606 16:56:23.474200 48991 memrowset-test.cc:446] Time spent Scanning
> rows where all are committed: real 0.502s        user 0.502s     sys 0.000s
>   I0606 16:56:29.845799 48991 memrowset-test.cc:446] Time spent Scanning
> rows where all are committed: real 0.503s        user 0.503s     sys 0.000s
>   I0606 16:56:36.221843 48991 memrowset-test.cc:446] Time spent Scanning
> rows where all are committed: real 0.502s        user 0.503s     sys 0.000s
>   I0606 16:56:42.599604 48991 memrowset-test.cc:446] Time spent Scanning
> rows where all are committed: real 0.502s        user 0.503s     sys 0.000s
>   I0606 16:56:48.962482 48991 memrowset-test.cc:446] Time spent Scanning
> rows where all are committed: real 0.503s        user 0.502s     sys 0.000s
>
> After this fix:
>
>   I0606 16:53:51.573541 48403 memrowset-test.cc:446] Time spent Scanning
> rows where all are committed: real 0.302s   user 0.302s     sys 0.000s
>   I0606 16:53:57.922485 48403 memrowset-test.cc:446] Time spent Scanning
> rows where all are committed: real 0.301s   user 0.301s     sys 0.000s
>   I0606 16:54:04.186511 48403 memrowset-test.cc:446] Time spent Scanning
> rows where all are committed: real 0.310s   user 0.310s     sys 0.000s
>   I0606 16:54:10.425207 48403 memrowset-test.cc:446] Time spent Scanning
> rows where all are committed: real 0.313s   user 0.313s     sys 0.000s
>   I0606 16:54:16.648202 48403 memrowset-test.cc:446] Time spent Scanning
> rows where all are committed: real 0.316s   user 0.316s     sys 0.000s
>   I0606 16:54:22.868655 48403 memrowset-test.cc:446] Time spent Scanning
> rows where all are committed: real 0.311s   user 0.312s     sys 0.000s
>   I0606 16:54:29.081784 48403 memrowset-test.cc:446] Time spent Scanning
> rows where all are committed: real 0.313s   user 0.314s     sys 0.000s
>   I0606 16:54:35.317623 48403 memrowset-test.cc:446] Time spent Scanning
> rows where all are committed: real 0.314s   user 0.312s     sys 0.003s
>   I0606 16:54:41.550542 48403 memrowset-test.cc:446] Time spent Scanning
> rows where all are committed: real 0.316s   user 0.315s     sys 0.000s
>   I0606 16:54:47.761406 48403 memrowset-test.cc:446] Time spent Scanning
> rows where all are committed: real 0.314s   user 0.314s     sys 0.000s
>
> I also manually inspected the assembly from codegen-test and compared vs
> a version compiled with LLVM 3.9 and found that it now has fewer
> instructions than before.
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/2017-May/113175.html
>
> Change-Id: I7e449df80e5cd02b9ce2dbf4daa5cf8151007dcf
> ---
> M src/kudu/codegen/CMakeLists.txt
> M src/kudu/codegen/module_builder.cc
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
>
>   git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/7099/1
> --
> To view, visit http://gerrit.cloudera.org:8080/7099
> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>
> Gerrit-MessageType: newchange
> Gerrit-Change-Id: I7e449df80e5cd02b9ce2dbf4daa5cf8151007dcf
> Gerrit-PatchSet: 1
> Gerrit-Project: kudu
> Gerrit-Branch: master
> Gerrit-Owner: Todd Lipcon <to...@apache.org>
> Gerrit-Reviewer: Dan Burkert <da...@apache.org>
>