You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Zhang, Liyun" <li...@intel.com> on 2017/11/15 15:58:24 UTC

Anyone knows the problem I found in VectorizedLogicBench.IfExprLongColumnLongColumnBench?

Hi all:
Now I am using hive micro bench(HIVE-10189) to test the performance improvement of AVX2 and AVX512.
When I test the VectorizedLogicBench.IfExprLongColumnLongColumnBench<https://github.com/apache/hive/blob/master/itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizedLogicBench.java#L115>, I found the result as following
When enabling AVX512:
o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnLongColumnBench.bench             avgt       10  1621602.652 ± 583775.700  us/op
When enabling AVX2:
o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnLongColumnBench.bench             avgt       10  1817855.876 ± 49289.868  us/op

You see that there is a great float for IfExprLongColumnLongColumnBench.bench, the  float is 583775 and the average value is 1621602. It shows that the values in the test are very discrete @Teddy, as you are more familiar with the code, do you know why the test data is discrete? If the data is discrete, does this mean the test data  is not stable?



Appreciate to get some feedback from you!
Best Regards
Kelly Zhang/Zhang,Liyun

Re: Anyone knows the problem I found in VectorizedLogicBench.IfExprLongColumnLongColumnBench?

Posted by Gopal Vijayaraghavan <go...@apache.org>.


> My guess is that the complex expression used in VectorizedLogicBench.IfExprLongColumnLongColumnBench actually uses more CPU than other expression.

if you have a -XX:+PrintAssembly dump (or run jmh with -prof perfasm), then we could see if the JDK is autovectorizing that loop or not.

That's an If condition evaluation loop without a branch, which was specifically written for the JIT to speed it up.

Cheers,
Gopal

RE: Anyone knows the problem I found in VectorizedLogicBench.IfExprLongColumnLongColumnBench?

Posted by "Zhang, Liyun" <li...@intel.com>.

Hi Gopal:
Really thanks for your reply!
You mean that if I limit only 1 cpu to run VectorizedLogicBench.IfExprLongColumnLongColumnBench, the variation will be small, is my understanding right? If yes, the variation became smaller than before after using taskset -cp 1 $pid. But I am confused all the tests in VectorizedLogicBench is better pipelined and vectorized, why there is no large variation for other tests in VectorizedLogicBench? My guess is that the complex expression used in VectorizedLogicBench.IfExprLongColumnLongColumnBench actually uses more CPU than other expression.

The expression used in VectorizedLogicBench.IfExprLongColumnLongColumnBench: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprLongColumnLongColumn.java#L90





Best Regards
Kelly Zhang/Zhang,Liyun




 
-----Original Message-----
From: Gopal Vijayaraghavan [mailto:gopalv@apache.org] 
Sent: Thursday, November 16, 2017 5:40 AM
To: dev@hive.apache.org
Cc: Zhang, Liyun <li...@intel.com>; Teddy Choi <tc...@hortonworks.com>
Subject: Re: Anyone knows the problem I found in VectorizedLogicBench.IfExprLongColumnLongColumnBench?

Hi,

>   You see that there is a great float for IfExprLongColumnLongColumnBench.bench, the  float is 583775 and the average value is 1621602. 

In my tests, the single core tests tended to have huge variations on Intel with Turbo boost.

CPU operations which are fast when stressing CPU in single threaded mode tended to get really slow when the other cores spin up and hitting thermal limits.

For most memory bound operations this is not easily visible, but the better pipelined and vectorized the loops get the worse the impact of dynamic CPU frequency scaling.

Can you collect active CPU frequency when running this benchmark and do "taskset -c 1" to force the run to stick to a single CPU?

Cheers,
Gopal

Re: Anyone knows the problem I found in VectorizedLogicBench.IfExprLongColumnLongColumnBench?

Posted by Gopal Vijayaraghavan <go...@apache.org>.

Hi,

>   You see that there is a great float for IfExprLongColumnLongColumnBench.bench, the  float is 583775 and the average value is 1621602. 

In my tests, the single core tests tended to have huge variations on Intel with Turbo boost.

CPU operations which are fast when stressing CPU in single threaded mode tended to get really slow when the other cores spin up and hitting thermal limits.

For most memory bound operations this is not easily visible, but the better pipelined and vectorized the loops get the worse the impact of dynamic CPU frequency scaling.

Can you collect active CPU frequency when running this benchmark and do "taskset -c 1" to force the run to stick to a single CPU?

Cheers,
Gopal

RE: Anyone knows the problem I found in VectorizedLogicBench.IfExprLongColumnLongColumnBench?

Posted by "Zhang, Liyun" <li...@intel.com>.

Hi all:
Now I am using hive micro bench(HIVE-10189) to test the performance improvement of AVX2 and AVX512.
When I test the VectorizedLogicBench.IfExprLongColumnLongColumnBench<https://github.com/apache/hive/blob/master/itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizedLogicBench.java#L115>, I found the result as following
When enabling AVX512:
o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnLongColumnBench.bench             avgt       10  1621602.652 ± 583775.700  us/op
When enabling AVX2:
o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnLongColumnBench.bench             avgt       10  1817855.876 ± 49289.868  us/op

You see that there is a great float for IfExprLongColumnLongColumnBench.bench, the  float is 583775 and the average value is 1621602. It shows that the values in the test are very discrete @Teddy, as you are more familiar with the code, do you know why the test data is discrete? If the data is discrete, does this mean the test data  is not stable?



Appreciate to get some feedback from you!
Best Regards
Kelly Zhang/Zhang,Liyun

Anyone knows the problem I found in VectorizedLogicBench.IfExprLongColumnLongColumnBench?

Posted by "Zhang, Liyun" <li...@intel.com>.

Hi all:
Now I am using hive micro bench(HIVE-10189) to test the performance improvement of AVX2 and AVX512.
When I test the VectorizedLogicBench.IfExprLongColumnLongColumnBench<https://github.com/apache/hive/blob/master/itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizedLogicBench.java#L115>, I found the result as following
When enabling AVX512:
o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnLongColumnBench.bench             avgt       10  1621602.652 ± 583775.700  us/op
When enabling AVX2:
o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnLongColumnBench.bench             avgt       10  1817855.876 ± 49289.868  us/op

You see that there is a great float for IfExprLongColumnLongColumnBench.bench, the  float is 583775 and the average value is 1621602. It shows that the values in the test are very discrete @Teddy, as you are more familiar with the code, do you know why the test data is discrete? If the data is discrete, does this mean the test data  is not stable?



Appreciate to get some feedback from you!
Best Regards
Kelly Zhang/Zhang,Liyun