You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Ankur (Jira)" <ji...@apache.org> on 2021/03/16 02:11:00 UTC
[jira] [Comment Edited] (LUCENE-9838) simd version of
VectorUtil.dotProduct
[ https://issues.apache.org/jira/browse/LUCENE-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302157#comment-17302157 ]
Ankur edited comment on LUCENE-9838 at 3/16/21, 2:10 AM:
---------------------------------------------------------
This is cool - [~rcmuir].
I played with this a little on my MacBook Pro (2019, *Memory*: 32 GB 2667 MHZ DDR4; *Processor*: 2.6 GHz 6-Core Intel Core i7) after downloading [OpenJDK build 16+36-2231|https://download.java.net/java/GA/jdk16/7863447f0ab643c585b9bdebf67c69db/36/GPL/openjdk-16_osx-x64_bin.tar.gz] and setting up a standalone [JMH benchmark|https://github.com/openjdk/jmh] project.
I copied over the old dotProduct implementation and the new one from your patch to _MyBenchmark.java_ in the JMH project space. Here are the results I got
{code:java}
Benchmark (size) Mode Cnt Score Error Units
MyBenchmark.dotProductOld 16 thrpt 5 90.896 ± 5.302 ops/us
MyBenchmark.dotProductNew 16 thrpt 5 100.901 ± 5.105 ops/us
MyBenchmark.dotProductOld 32 thrpt 5 53.563 ± 2.378 ops/us
MyBenchmark.dotProductNew 32 thrpt 5 97.610 ± 5.393 ops/us
MyBenchmark.dotProductOld 64 thrpt 5 29.792 ± 1.246 ops/us
MyBenchmark.dotProductNew 64 thrpt 5 73.499 ± 3.640 ops/us
MyBenchmark.dotProductOld 128 thrpt 5 16.906 ± 0.751 ops/us
MyBenchmark.dotProductNew 128 thrpt 5 65.068 ± 3.986 ops/us
MyBenchmark.dotProductOld 256 thrpt 5 8.360 ± 0.125 ops/us
MyBenchmark.dotProductNew 256 thrpt 5 42.595 ± 2.958 ops/us
MyBenchmark.dotProductOld 512 thrpt 5 4.231 ± 0.158 ops/us
MyBenchmark.dotProductNew 512 thrpt 5 26.283 ± 0.640 ops/us
MyBenchmark.dotProductOld 1024 thrpt 5 2.104 ± 0.093 ops/us
MyBenchmark.dotProductNew 1024 thrpt 5 14.389 ± 0.720 ops/us
{code}
These benchmarks were run after adding annotations to disable TieredCompilation and vector bounds check. Looks like for small vector size (*16 elements*) we see *10%* improvement but for large vectors (*128 or more* elements) the improvement is *_4X or higher._*
was (Author: goankur):
This is cool - [~rcmuir].
I played with this a little on my MacBook Pro (2019, *Memory*: 32 GB 2667 MHZ DDR4; *Processor*: 2.6 GHz 6-Core Intel Core i7) after downloading [OpenJDK build 16+36-2231|https://download.java.net/java/GA/jdk16/7863447f0ab643c585b9bdebf67c69db/36/GPL/openjdk-16_osx-x64_bin.tar.gz] and setting up a standalone [JMH benchmark|https://github.com/openjdk/jmh] project.
I copied over the old dotProduct implementation and the new one from your patch to _MyBenchmark.java_ in the JMH project space. Here are the results I got
{code:java}
Benchmark (size) Mode Cnt Score Error Units
MyBenchmark.dotProductOld 16 thrpt 5 90.896 ± 5.302 ops/us
MyBenchmark.dotProductNew 16 thrpt 5 100.901 ± 5.105 ops/us
MyBenchmark.dotProductOld 32 thrpt 5 53.563 ± 2.378 ops/us
MyBenchmark.dotProductNew 32 thrpt 5 97.610 ± 5.393 ops/us
MyBenchmark.dotProductOld 64 thrpt 5 29.792 ± 1.246 ops/us
MyBenchmark.dotProductNew 64 thrpt 5 73.499 ± 3.640 ops/us
MyBenchmark.dotProductOld 128 thrpt 5 16.906 ± 0.751 ops/us
MyBenchmark.dotProductNew 128 thrpt 5 65.068 ± 3.986 ops/us
MyBenchmark.dotProductOld 256 thrpt 5 8.360 ± 0.125 ops/us
MyBenchmark.dotProductNew 256 thrpt 5 42.595 ± 2.958 ops/us
MyBenchmark.dotProductOld 512 thrpt 5 4.231 ± 0.158 ops/us
MyBenchmark.dotProductNew 512 thrpt 5 26.283 ± 0.640 ops/us
MyBenchmark.dotProductOld 1024 thrpt 5 2.104 ± 0.093 ops/us
MyBenchmark.dotProductNew 1024 thrpt 5 14.389 ± 0.720 ops/us
{code}
These benchmarks were run after adding annotations to disable TieredCompilation and vector bounds check. Looks like for small vector size (*16 elements*) we see *10%* improvement but for large vectors (*128 or more* elements) the improvement is *_4X or higher._*
> simd version of VectorUtil.dotProduct
> -------------------------------------
>
> Key: LUCENE-9838
> URL: https://issues.apache.org/jira/browse/LUCENE-9838
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Robert Muir
> Priority: Major
> Attachments: LUCENE-9838.patch
>
> Time Spent: 3.5h
> Remaining Estimate: 0h
>
> Followup to LUCENE-9837
> Let's explore using JDK 16 vector API to speed this up more. It might be a hassle to try to MR-JAR/package up for users (adding commandline flags and stuff), but it gives good performance.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org