You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zhang, Liyun" <li...@intel.com> on 2017/11/15 15:58:24 UTC
Anyone knows the problem I found in
VectorizedLogicBench.IfExprLongColumnLongColumnBench?
Hi all:
Now I am using hive micro bench(HIVE-10189) to test the performance improvement of AVX2 and AVX512.
When I test the VectorizedLogicBench.IfExprLongColumnLongColumnBench<https://github.com/apache/hive/blob/master/itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizedLogicBench.java#L115>, I found the result as following
When enabling AVX512:
o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnLongColumnBench.bench avgt 10 1621602.652 ± 583775.700 us/op
When enabling AVX2:
o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnLongColumnBench.bench avgt 10 1817855.876 ± 49289.868 us/op
You see that there is a great float for IfExprLongColumnLongColumnBench.bench, the float is 583775 and the average value is 1621602. It shows that the values in the test are very discrete @Teddy, as you are more familiar with the code, do you know why the test data is discrete? If the data is discrete, does this mean the test data is not stable?
Appreciate to get some feedback from you!
Best Regards
Kelly Zhang/Zhang,Liyun
Re: Anyone knows the problem I found in
VectorizedLogicBench.IfExprLongColumnLongColumnBench?
Posted by Gopal Vijayaraghavan <go...@apache.org>.
> My guess is that the complex expression used in VectorizedLogicBench.IfExprLongColumnLongColumnBench actually uses more CPU than other expression.
if you have a -XX:+PrintAssembly dump (or run jmh with -prof perfasm), then we could see if the JDK is autovectorizing that loop or not.
That's an If condition evaluation loop without a branch, which was specifically written for the JIT to speed it up.
Cheers,
Gopal
RE: Anyone knows the problem I found in
VectorizedLogicBench.IfExprLongColumnLongColumnBench?
Posted by "Zhang, Liyun" <li...@intel.com>.
Hi Gopal:
Really thanks for your reply!
You mean that if I limit only 1 cpu to run VectorizedLogicBench.IfExprLongColumnLongColumnBench, the variation will be small, is my understanding right? If yes, the variation became smaller than before after using taskset -cp 1 $pid. But I am confused all the tests in VectorizedLogicBench is better pipelined and vectorized, why there is no large variation for other tests in VectorizedLogicBench? My guess is that the complex expression used in VectorizedLogicBench.IfExprLongColumnLongColumnBench actually uses more CPU than other expression.
The expression used in VectorizedLogicBench.IfExprLongColumnLongColumnBench: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprLongColumnLongColumn.java#L90
Best Regards
Kelly Zhang/Zhang,Liyun
-----Original Message-----
From: Gopal Vijayaraghavan [mailto:gopalv@apache.org]
Sent: Thursday, November 16, 2017 5:40 AM
To: dev@hive.apache.org
Cc: Zhang, Liyun <li...@intel.com>; Teddy Choi <tc...@hortonworks.com>
Subject: Re: Anyone knows the problem I found in VectorizedLogicBench.IfExprLongColumnLongColumnBench?
Hi,
> You see that there is a great float for IfExprLongColumnLongColumnBench.bench, the float is 583775 and the average value is 1621602.
In my tests, the single core tests tended to have huge variations on Intel with Turbo boost.
CPU operations which are fast when stressing CPU in single threaded mode tended to get really slow when the other cores spin up and hitting thermal limits.
For most memory bound operations this is not easily visible, but the better pipelined and vectorized the loops get the worse the impact of dynamic CPU frequency scaling.
Can you collect active CPU frequency when running this benchmark and do "taskset -c 1" to force the run to stick to a single CPU?
Cheers,
Gopal
Re: Anyone knows the problem I found in
VectorizedLogicBench.IfExprLongColumnLongColumnBench?
Posted by Gopal Vijayaraghavan <go...@apache.org>.
Hi,
> You see that there is a great float for IfExprLongColumnLongColumnBench.bench, the float is 583775 and the average value is 1621602.
In my tests, the single core tests tended to have huge variations on Intel with Turbo boost.
CPU operations which are fast when stressing CPU in single threaded mode tended to get really slow when the other cores spin up and hitting thermal limits.
For most memory bound operations this is not easily visible, but the better pipelined and vectorized the loops get the worse the impact of dynamic CPU frequency scaling.
Can you collect active CPU frequency when running this benchmark and do "taskset -c 1" to force the run to stick to a single CPU?
Cheers,
Gopal
RE: Anyone knows the problem I found in
VectorizedLogicBench.IfExprLongColumnLongColumnBench?
Posted by "Zhang, Liyun" <li...@intel.com>.
Hi all:
Now I am using hive micro bench(HIVE-10189) to test the performance improvement of AVX2 and AVX512.
When I test the VectorizedLogicBench.IfExprLongColumnLongColumnBench<https://github.com/apache/hive/blob/master/itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizedLogicBench.java#L115>, I found the result as following
When enabling AVX512:
o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnLongColumnBench.bench avgt 10 1621602.652 ± 583775.700 us/op
When enabling AVX2:
o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnLongColumnBench.bench avgt 10 1817855.876 ± 49289.868 us/op
You see that there is a great float for IfExprLongColumnLongColumnBench.bench, the float is 583775 and the average value is 1621602. It shows that the values in the test are very discrete @Teddy, as you are more familiar with the code, do you know why the test data is discrete? If the data is discrete, does this mean the test data is not stable?
Appreciate to get some feedback from you!
Best Regards
Kelly Zhang/Zhang,Liyun
Anyone knows the problem I found in
VectorizedLogicBench.IfExprLongColumnLongColumnBench?
Posted by "Zhang, Liyun" <li...@intel.com>.
Hi all:
Now I am using hive micro bench(HIVE-10189) to test the performance improvement of AVX2 and AVX512.
When I test the VectorizedLogicBench.IfExprLongColumnLongColumnBench<https://github.com/apache/hive/blob/master/itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizedLogicBench.java#L115>, I found the result as following
When enabling AVX512:
o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnLongColumnBench.bench avgt 10 1621602.652 ± 583775.700 us/op
When enabling AVX2:
o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnLongColumnBench.bench avgt 10 1817855.876 ± 49289.868 us/op
You see that there is a great float for IfExprLongColumnLongColumnBench.bench, the float is 583775 and the average value is 1621602. It shows that the values in the test are very discrete @Teddy, as you are more familiar with the code, do you know why the test data is discrete? If the data is discrete, does this mean the test data is not stable?
Appreciate to get some feedback from you!
Best Regards
Kelly Zhang/Zhang,Liyun