You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "rmuir (via GitHub)" <gi...@apache.org> on 2023/10/28 03:50:01 UTC

[PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

rmuir opened a new pull request, #12731:
URL: https://github.com/apache/lucene/pull/12731

   The intel fma is nice, and its easier to reason about when looking at assembly. We basically reduce the error for free where its available. Along with another change (reducing the unrolling for cosine, since it has 3 fma ops already), we can speed up cosine from 6 -> 8 uops/us.
   
   On the arm the fma leads to slight slowdowns, so we don't use it. Its not much, just something like 10%, but seems like the wrong tradeoff.
   
   If you run the code with `-XX-UseFMA` there's no slowdown, but no speedup either. And obviously, no changes for ARM here.
   
   ```
   Skylake AVX-256
   
   Main:
   Benchmark                                  (size)   Mode  Cnt   Score   Error   Units
   VectorUtilBenchmark.floatCosineScalar        1024  thrpt    5   0.624 ± 0.041  ops/us
   VectorUtilBenchmark.floatCosineVector        1024  thrpt    5   5.988 ± 0.111  ops/us
   VectorUtilBenchmark.floatDotProductScalar    1024  thrpt    5   1.959 ± 0.032  ops/us
   VectorUtilBenchmark.floatDotProductVector    1024  thrpt    5  12.058 ± 0.920  ops/us
   VectorUtilBenchmark.floatSquareScalar        1024  thrpt    5   1.422 ± 0.018  ops/us
   VectorUtilBenchmark.floatSquareVector        1024  thrpt    5   9.837 ± 0.154  ops/us
   
   Patch:
   Benchmark                                  (size)   Mode  Cnt   Score   Error   Units
   VectorUtilBenchmark.floatCosineScalar        1024  thrpt    5   0.638 ± 0.006  ops/us
   VectorUtilBenchmark.floatCosineVector        1024  thrpt    5   8.164 ± 0.084  ops/us
   VectorUtilBenchmark.floatDotProductScalar    1024  thrpt    5   1.997 ± 0.027  ops/us
   VectorUtilBenchmark.floatDotProductVector    1024  thrpt    5  12.486 ± 0.163  ops/us
   VectorUtilBenchmark.floatSquareScalar        1024  thrpt    5   1.445 ± 0.014  ops/us
   VectorUtilBenchmark.floatSquareVector        1024  thrpt    5  11.682 ± 0.129  ops/us
   
   Patch (with -jvmArgsAppend '-XX:-UseFMA'):
   Benchmark                                  (size)   Mode  Cnt   Score   Error   Units
   VectorUtilBenchmark.floatCosineScalar        1024  thrpt    5   0.641 ± 0.005  ops/us
   VectorUtilBenchmark.floatCosineVector        1024  thrpt    5   6.102 ± 0.053  ops/us
   VectorUtilBenchmark.floatDotProductScalar    1024  thrpt    5   1.997 ± 0.007  ops/us
   VectorUtilBenchmark.floatDotProductVector    1024  thrpt    5  12.177 ± 0.170  ops/us
   VectorUtilBenchmark.floatSquareScalar        1024  thrpt    5   1.450 ± 0.027  ops/us
   VectorUtilBenchmark.floatSquareVector        1024  thrpt    5  10.464 ± 0.154  ops/us
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on code in PR #12731:
URL: https://github.com/apache/lucene/pull/12731#discussion_r1375252807


##########
lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##########
@@ -77,6 +77,47 @@ final class PanamaVectorUtilSupport implements VectorUtilSupport {
         VectorizationProvider.TESTS_FORCE_INTEGER_VECTORS || (isAMD64withoutAVX2 == false);
   }
 
+  private static final String MANAGEMENT_FACTORY_CLASS = "java.lang.management.ManagementFactory";
+  private static final String HOTSPOT_BEAN_CLASS = "com.sun.management.HotSpotDiagnosticMXBean";
+
+  // best effort to see if FMA is fast (this is architecture-independent option)
+  private static boolean hasFastFMA() {
+    // on ARM cpus, FMA works fine but is a slight slowdown: don't use it.
+    if (Constants.OS_ARCH.equals("amd64") == false) {
+      return false;
+    }
+    try {
+      final Class<?> beanClazz = Class.forName(HOTSPOT_BEAN_CLASS);
+      // we use reflection for this, because the management factory is not part
+      // of Java 8's compact profile:
+      final Object hotSpotBean =

Review Comment:
   Ok, thats fine thank for confirming that it works. The "require static" allows access to the module (if available). So it looks like the code works fine.
   
   I think my only change I'd suggested is to add the FMA enablement to the logging message as stated above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on code in PR #12731:
URL: https://github.com/apache/lucene/pull/12731#discussion_r1375324223


##########
lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##########
@@ -77,6 +77,47 @@ final class PanamaVectorUtilSupport implements VectorUtilSupport {
         VectorizationProvider.TESTS_FORCE_INTEGER_VECTORS || (isAMD64withoutAVX2 == false);
   }
 
+  private static final String MANAGEMENT_FACTORY_CLASS = "java.lang.management.ManagementFactory";
+  private static final String HOTSPOT_BEAN_CLASS = "com.sun.management.HotSpotDiagnosticMXBean";
+
+  // best effort to see if FMA is fast (this is architecture-independent option)
+  private static boolean hasFastFMA() {
+    // on ARM cpus, FMA works fine but is a slight slowdown: don't use it.
+    if (Constants.OS_ARCH.equals("amd64") == false) {
+      return false;
+    }
+    try {
+      final Class<?> beanClazz = Class.forName(HOTSPOT_BEAN_CLASS);
+      // we use reflection for this, because the management factory is not part
+      // of Java 8's compact profile:
+      final Object hotSpotBean =

Review Comment:
   I pushed the logging change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on code in PR #12731:
URL: https://github.com/apache/lucene/pull/12731#discussion_r1375248973


##########
lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##########
@@ -77,6 +77,47 @@ final class PanamaVectorUtilSupport implements VectorUtilSupport {
         VectorizationProvider.TESTS_FORCE_INTEGER_VECTORS || (isAMD64withoutAVX2 == false);
   }
 
+  private static final String MANAGEMENT_FACTORY_CLASS = "java.lang.management.ManagementFactory";
+  private static final String HOTSPOT_BEAN_CLASS = "com.sun.management.HotSpotDiagnosticMXBean";
+
+  // best effort to see if FMA is fast (this is architecture-independent option)
+  private static boolean hasFastFMA() {
+    // on ARM cpus, FMA works fine but is a slight slowdown: don't use it.
+    if (Constants.OS_ARCH.equals("amd64") == false) {
+      return false;
+    }
+    try {
+      final Class<?> beanClazz = Class.forName(HOTSPOT_BEAN_CLASS);
+      // we use reflection for this, because the management factory is not part
+      // of Java 8's compact profile:
+      final Object hotSpotBean =

Review Comment:
   surely we can leave comments about the module system to another issue. It was somehow ok for ramusageestimator to do this, but not ok for vectors code?
   
   Honestly, i havent a clue about the module system (nor a care) and no idea how it works or what 'requires static' means or any of that. To me, it looks like more overengineered java garbage (sorry). So I'm ill-equipped to be updating comments inside RAMUsageEstimator. I just want to try to improve the vectorization here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on PR #12731:
URL: https://github.com/apache/lucene/pull/12731#issuecomment-1785549757

   > I think the Panama API should allow the user to figure out how many parallel units are available to somehow dynamically split work correctly.
   
   I'm not even sure openjdk/hotspot knows this or even attempts to approximate it? It never deals with `-ffast-math`-style optimizations that would make use of it, due to its floating point restrictions, right?
   
   but knowing the CPU info/model would help. then at least folks can at least do it themselves.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on PR #12731:
URL: https://github.com/apache/lucene/pull/12731#issuecomment-1785453474

   > Last time i tried to figure out WTF was happening here, I think i determined that floating point reproducibility was still preventing this from happening? That there isn't like a "bail out" from this on the vector API, instead just some clever wording in the javadocs of `reduceLanes`
   > 
   > Which is really sad, how is the vector API supposed to be usable if everyone has to unroll their own loops in order to use 100% of the hardware instead of 25%.
   
   The float use case is problematic becaue order of multiplications/sums changes the result. So you can't easily rewrite the stuff to run in parallel as the result would be different. This is also the reason why the auto-vectorizer can't do anything
   
   I think the Panama API should allow the user to figure out how many parallel units are available to somehow dynamically split work correctly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on code in PR #12731:
URL: https://github.com/apache/lucene/pull/12731#discussion_r1375244023


##########
lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##########
@@ -77,6 +77,47 @@ final class PanamaVectorUtilSupport implements VectorUtilSupport {
         VectorizationProvider.TESTS_FORCE_INTEGER_VECTORS || (isAMD64withoutAVX2 == false);
   }
 
+  private static final String MANAGEMENT_FACTORY_CLASS = "java.lang.management.ManagementFactory";
+  private static final String HOTSPOT_BEAN_CLASS = "com.sun.management.HotSpotDiagnosticMXBean";
+
+  // best effort to see if FMA is fast (this is architecture-independent option)
+  private static boolean hasFastFMA() {
+    // on ARM cpus, FMA works fine but is a slight slowdown: don't use it.
+    if (Constants.OS_ARCH.equals("amd64") == false) {
+      return false;
+    }
+    try {
+      final Class<?> beanClazz = Class.forName(HOTSPOT_BEAN_CLASS);
+      // we use reflection for this, because the management factory is not part
+      // of Java 8's compact profile:
+      final Object hotSpotBean =

Review Comment:
   it works with the module system at least: i tested it. If we want to move this code around i am fine, as long as i have `static final` constant.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on PR #12731:
URL: https://github.com/apache/lucene/pull/12731#issuecomment-1785178856

   Last time i tried to figure out WTF was happening here, I think i determined that floating point reproducibility was still preventing this from happening? That there isn't like a "bail out" from this on the vector API, instead just some clever wording in the javadocs of `reduceLanes`
   
   Which is really sad, how is the vector API supposed to be usable if everyone has to unroll their own loops in order to use 100% of the hardware instead of 25%.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on PR #12731:
URL: https://github.com/apache/lucene/pull/12731#issuecomment-1785145823

   > ha! So just removing the overly aggressive unrolling in cosine improves things. 
   
   well, only in combination with switch to FMA. seems then its able to keep cpu busy multiplying.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on PR #12731:
URL: https://github.com/apache/lucene/pull/12731#issuecomment-1785163931

   > .. and yes (I've not forgotten), we need something like a `java.lang.Architecture/Platform`, that is queryable for such low-level support (rather than resorting to beans - which actually works kinda ok, but is not ideal)
   
   and compiler should be be fixed to unroll basic loops to take advantage of the fact you can do 4 of these things in parallel on modern cpus.
   
   or failing that, if i'm gonna have to unroll loops myself, then at least give me some basic info (e.g. cpu model) so i can do it properly. 
   
   Currently it is the worst of both worlds.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

Posted by "ChrisHegarty (via GitHub)" <gi...@apache.org>.
ChrisHegarty commented on PR #12731:
URL: https://github.com/apache/lucene/pull/12731#issuecomment-1783869625

   .. and yes (I've not forgotten), we need something like a `java.lang.Architecture/Platform`, that is queryable for such low-level support (rather than resorting to beans - which actually works kinda ok, but is not ideal)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

Posted by "asfgit (via GitHub)" <gi...@apache.org>.
asfgit merged PR #12731:
URL: https://github.com/apache/lucene/pull/12731


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on code in PR #12731:
URL: https://github.com/apache/lucene/pull/12731#discussion_r1375243087


##########
lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##########
@@ -77,6 +77,47 @@ final class PanamaVectorUtilSupport implements VectorUtilSupport {
         VectorizationProvider.TESTS_FORCE_INTEGER_VECTORS || (isAMD64withoutAVX2 == false);
   }
 
+  private static final String MANAGEMENT_FACTORY_CLASS = "java.lang.management.ManagementFactory";
+  private static final String HOTSPOT_BEAN_CLASS = "com.sun.management.HotSpotDiagnosticMXBean";
+
+  // best effort to see if FMA is fast (this is architecture-independent option)
+  private static boolean hasFastFMA() {
+    // on ARM cpus, FMA works fine but is a slight slowdown: don't use it.
+    if (Constants.OS_ARCH.equals("amd64") == false) {
+      return false;
+    }
+    try {
+      final Class<?> beanClazz = Class.forName(HOTSPOT_BEAN_CLASS);
+      // we use reflection for this, because the management factory is not part
+      // of Java 8's compact profile:
+      final Object hotSpotBean =

Review Comment:
   Haha, I know this code from the RamUsage code parts. The comment should possibly be updated at both places to mention module system and that it is optional there.
   
   This module is declared optional in our module-info: https://github.com/apache/lucene/blob/f5776c88449ff16f7347ccbe6e26e5bddd8c94f7/lucene/core/src/java/module-info.java#L26
   
   So basically this code is fine, we do not want to hardcode the module (as it is not part of the JDK platform standard). Maybe we should add the "FMA enabled" also to the logger message. Should be easy by making the flag pkg private and refer to it from the initialization code where the log message is printed: https://github.com/apache/lucene/blob/f5776c88449ff16f7347ccbe6e26e5bddd8c94f7/lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorizationProvider.java#L60-L67
   
   Let's add the same with `PanamaVectorUtilSupport.HAS_FAST_FMA ? "; FMA enabled" : ""`
   
   We should maybe add this code to some common class in utils package (like `Constants#getVMOption(String name)`). We can create a separate PR for that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

Posted by "ChrisHegarty (via GitHub)" <gi...@apache.org>.
ChrisHegarty commented on code in PR #12731:
URL: https://github.com/apache/lucene/pull/12731#discussion_r1375278303


##########
lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##########
@@ -77,6 +77,47 @@ final class PanamaVectorUtilSupport implements VectorUtilSupport {
         VectorizationProvider.TESTS_FORCE_INTEGER_VECTORS || (isAMD64withoutAVX2 == false);
   }
 
+  private static final String MANAGEMENT_FACTORY_CLASS = "java.lang.management.ManagementFactory";
+  private static final String HOTSPOT_BEAN_CLASS = "com.sun.management.HotSpotDiagnosticMXBean";
+
+  // best effort to see if FMA is fast (this is architecture-independent option)
+  private static boolean hasFastFMA() {
+    // on ARM cpus, FMA works fine but is a slight slowdown: don't use it.
+    if (Constants.OS_ARCH.equals("amd64") == false) {
+      return false;
+    }
+    try {
+      final Class<?> beanClazz = Class.forName(HOTSPOT_BEAN_CLASS);
+      // we use reflection for this, because the management factory is not part
+      // of Java 8's compact profile:
+      final Object hotSpotBean =

Review Comment:
   > Let's add the same with PanamaVectorUtilSupport.HAS_FAST_FMA ? "; FMA enabled" : ""
   
   ++



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

Posted by "ChrisHegarty (via GitHub)" <gi...@apache.org>.
ChrisHegarty commented on PR #12731:
URL: https://github.com/apache/lucene/pull/12731#issuecomment-1783869078

   ha! So just removing the overly aggressive unrolling in cosine improves things.   The check on FMA is nice - I had similar thoughts ( you just beat me to it! ), and it inlines nicely.   I also agree, we don't wanna use FMA on ARM, it performs 10-15% worse on my M2.
   
   Sanity results from my Rocket Lake:
   
   main:
   ```
   VectorUtilBenchmark.floatCosineScalar        1024  thrpt    5   0.845 ± 0.001  ops/us
   VectorUtilBenchmark.floatCosineVector        1024  thrpt    5   8.885 ± 0.005  ops/us
   VectorUtilBenchmark.floatDotProductScalar    1024  thrpt    5   3.406 ± 0.018  ops/us
   VectorUtilBenchmark.floatDotProductVector    1024  thrpt    5  26.168 ± 0.009  ops/us
   VectorUtilBenchmark.floatSquareScalar        1024  thrpt    5   2.549 ± 0.005  ops/us
   VectorUtilBenchmark.floatSquareVector        1024  thrpt    5  19.283 ± 0.001  ops/us
   ```
   
   Robert's branch:
   ```
   VectorUtilBenchmark.floatCosineScalar        1024  thrpt    5   0.845 ± 0.003  ops/us
   VectorUtilBenchmark.floatCosineVector        1024  thrpt    5  14.636 ± 0.016  ops/us
   VectorUtilBenchmark.floatDotProductScalar    1024  thrpt    5   3.400 ± 0.083  ops/us
   VectorUtilBenchmark.floatDotProductVector    1024  thrpt    5  27.265 ± 0.065  ops/us
   VectorUtilBenchmark.floatSquareScalar        1024  thrpt    5   2.548 ± 0.012  ops/us
   VectorUtilBenchmark.floatSquareVector        1024  thrpt    5  25.529 ± 0.207  ops/us
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org