You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@maven.apache.org by "gnodet (via GitHub)" <gi...@apache.org> on 2023/04/05 09:45:35 UTC

[GitHub] [maven-build-cache-extension] gnodet opened a new pull request, #58: Add METRO hash implementation, change the default, fix the web site

gnodet opened a new pull request, #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58

   This PR provides new METRO / METRO_MM hash algorithms based on Zero-Allocation-Hashing Metro hash implementation.
   Given it's the fastest, the default is changed to it and the website modified accordingly.
   This is not very clear when/if people should change.
   Also, I would have assumed the `MM` versions of the hashes provide better performances, but the added perf test does not seem to indicate the 
   same:
   
   ```
   Benchmark           Mode  Cnt  Score   Error   Units
   PerfTest.METRO     thrpt    3  0.443 ± 0.051  ops/ms
   PerfTest.METRO_MM  thrpt    3  0.354 ± 0.176  ops/ms
   PerfTest.SHA1      thrpt    3  0.193 ± 0.058  ops/ms
   PerfTest.SHA256    thrpt    3  0.144 ± 0.056  ops/ms
   PerfTest.XX        thrpt    3  0.405 ± 0.242  ops/ms
   PerfTest.XXMM      thrpt    3  0.375 ± 0.170  ops/ms
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [maven-build-cache-extension] qweek commented on pull request #58: Add METRO hash implementation, change the default, fix the web site

Posted by "qweek (via GitHub)" <gi...@apache.org>.
qweek commented on PR #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58#issuecomment-1503243437

   I have a couple more comments to your benchmark.
   1. Throughput benchmark does not show you real picture, because it depends on IO performance and total load on the disk. This means that results will most likely contain outliers, and averaged throughput will show the wrong value. It is better in this case to use Mode.SampleTime with percentiles.
   2. Benchmark results also depends on file size, disk cache, etc. Your benchmark includes have small test suite of 149 java files with total size less than 1 MB.
   I ran similar benchmark on Linux with random generated files 128B .. 128KB, 16 files for each size (total size ~4 MB).
   ```
   Benchmark                  Mode    Cnt    Score      Error  Units
   PerfTest.XX                sample  3232    2.783 ±   0.016  ms/op
   PerfTest.XX:XX·p0.00       sample          2.49             ms/op
   PerfTest.XX:XX·p0.50       sample          2.793            ms/op
   PerfTest.XX:XX·p0.90       sample          2.99             ms/op
   PerfTest.XX:XX·p0.95       sample          3.127            ms/op
   PerfTest.XX:XX·p0.99       sample          3.502            ms/op
   PerfTest.XX:XX·p0.999      sample          5.278            ms/op
   PerfTest.XX:XX·p0.9999     sample          5.833            ms/op
   PerfTest.XX:XX·p1.00       sample          5.833            ms/op
   
   Benchmark                  Mode    Cnt    Score      Error  Units
   PerfTest.XXMM              sample  2701    3.332 ±   0.014  ms/op
   PerfTest.XXMM:XXMM·p0.00   sample          3.211            ms/op
   PerfTest.XXMM:XXMM·p0.50   sample          3.297            ms/op
   PerfTest.XXMM:XXMM·p0.90   sample          3.457            ms/op
   PerfTest.XXMM:XXMM·p0.95   sample          3.502            ms/op
   PerfTest.XXMM:XXMM·p0.99   sample          3.715            ms/op
   PerfTest.XXMM:XXMM·p0.999  sample          6.555            ms/op
   PerfTest.XXMM:XXMM·p0.9999 sample         11.682            ms/op
   PerfTest.XXMM:XXMM·p1.00   sample         11.682            ms/op
   ```
   In this benchmark XX with Memory Mapped file was **6 - 29% slower**
   
   With random generated files 128KB .. 128MB, 16 files for each size (total size ~4 GB, which is more realistic suite in our case)
   ```
   Benchmark                  Mode    Cnt    Score     Error     Units
   PerfTest.XX                sample  3      3836.39 ± 309.251   ms/op
   PerfTest.XX:XX·p0.00       sample         3821.011            ms/op
   PerfTest.XX:XX·p0.50       sample         3833.594            ms/op
   PerfTest.XX:XX·p0.90       sample         3854.565            ms/op
   PerfTest.XX:XX·p0.95       sample         3854.565            ms/op
   PerfTest.XX:XX·p0.99       sample         3854.565            ms/op
   PerfTest.XX:XX·p0.999      sample         3854.565            ms/op
   PerfTest.XX:XX·p0.9999     sample         3854.565            ms/op
   PerfTest.XX:XX·p1.00       sample         3854.565            ms/op
   
   Benchmark                  Mode    Cnt    Score      Error    Units
   PerfTest.XXMM              sample  7      1523.132 ± 233.943  ms/op
   PerfTest.XXMM:XXMM·p0.00   sample         1413.48             ms/op
   PerfTest.XXMM:XXMM·p0.50   sample         1533.018            ms/op
   PerfTest.XXMM:XXMM·p0.90   sample         1663.042            ms/op
   PerfTest.XXMM:XXMM·p0.95   sample         1663.042            ms/op
   PerfTest.XXMM:XXMM·p0.99   sample         1663.042            ms/op
   PerfTest.XXMM:XXMM·p0.999  sample         1663.042            ms/op
   PerfTest.XXMM:XXMM·p0.9999 sample         1663.042            ms/op
   PerfTest.XXMM:XXMM·p1.00   sample         1663.042            ms/op
   ```
   In this benchmark XX with Memory Mapped file was **132 - 170% faster**
   
   You can adjust benchmark according to your load, but main problem is that test run takes a lot of time
   ```
   import java.io.IOException;
   import java.nio.file.Files;
   import java.nio.file.Path;
   import java.nio.file.Paths;
   import java.security.SecureRandom;
   import java.util.ArrayList;
   import java.util.List;
   import java.util.UUID;
   
   public static class HashState {
       private static final String USER_DIRECTORY = System.getProperty("user.dir");
       private static List<Path> paths;
   
       @Setup(Level.Iteration)
       public void setup() {
   	paths = new ArrayList<>();
   
           startFileSize = Integer.parseInt(System.getProperty("from", "128"));
           endFileSize = Integer.parseInt(System.getProperty("to", "131072"));
   
           for (int fileSize = startFileSize; fileSize <= endFileSize; fileSize *= 2) {
               for (int i = 0; i < 16; i++) {
                   String fileName = fileSize + "B_" + UUID.randomUUID() + ".bin";
                   Path filePath = Paths.get(USER_DIRECTORY, "target", fileName);
   
                   generateRandomBinaryFile(filePath, fileSize);
                   paths.add(filePath);
               }
           }
       }
   
       private static void generateRandomBinaryFile(Path path, int fileSize) {
           SecureRandom random = new SecureRandom();
           byte[] buffer = new byte[fileSize];
           random.nextBytes(buffer);
   
           try {
               Files.createDirectories(path.getParent());
               Files.write(path, buffer);
               System.out.println("Random binary file generated successfully: " + path);
           } catch (IOException e) {
               System.err.println("Error while generating random binary file: " + e.getMessage());
           }
       }
   
       @TearDown(Level.Iteration)
       public void deleteAllPaths() {
           for (Path path : paths) {
               try {
                   Files.delete(path);
                   System.out.println("Deleted file: " + path);
               } catch (IOException e) {
                   System.err.println("Error while deleting file: " + path + ", " + e.getMessage());
               }
           }
       }
   }
   ```
   code generated by GPT-4 ©


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [maven-build-cache-extension] olamy commented on pull request #58: Add METRO hash implementation, change the default, fix the web site

Posted by "olamy (via GitHub)" <gi...@apache.org>.
olamy commented on PR #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58#issuecomment-1539159212

   @gnodet happy to merge it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [maven-build-cache-extension] bmarwell commented on a diff in pull request #58: Add METRO hash implementation, change the default, fix the web site

Posted by "bmarwell (via GitHub)" <gi...@apache.org>.
bmarwell commented on code in PR #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58#discussion_r1227069806


##########
src/main/java/org/apache/maven/buildcache/hash/Zah.java:
##########
@@ -20,43 +20,55 @@
 
 import java.io.IOException;
 import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
 import java.nio.file.Files;
 import java.nio.file.Path;
 
 import net.openhft.hashing.LongHashFunction;
 
+import static java.nio.channels.FileChannel.MapMode.READ_ONLY;
+import static java.nio.file.StandardOpenOption.READ;
+
 /**
- * XX
+ * Zero-Allocation-Hash based factory
  */
-public class XX implements Hash.Factory {
+public class Zah implements Hash.Factory {
+
+    private final String name;
+    private final LongHashFunction hash;
+    private final boolean useMemoryMappedBuffers;
 
-    static final LongHashFunction INSTANCE = LongHashFunction.xx();
+    public Zah(String name, LongHashFunction hash, boolean useMemoryMappedBuffers) {

Review Comment:
   -1 on the boolean parameter. What if you need to distinguish further? Better use a String or Enum.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [maven-build-cache-extension] qweek commented on pull request #58: Add METRO hash implementation, change the default, fix the web site

Posted by "qweek (via GitHub)" <gi...@apache.org>.
qweek commented on PR #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58#issuecomment-1503420246

   XX algorithm is enough for most cases.
   MM optimized version may be useful for big projects with lots of module, files and dependencies.
   SHA is only needed if you stick to the classic time-tested and NSA verified algorithms (it always loses in speed)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [maven-build-cache-extension] olamy commented on pull request #58: Add METRO hash implementation, change the default, fix the web site

Posted by "olamy (via GitHub)" <gi...@apache.org>.
olamy commented on PR #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58#issuecomment-1589018110

   > @olamy
   > 
   > > maybe would be good to add a comment regarding usage of `*-MM` implementations which need some extra setup with jdk17+
   > > In top directory create a file `.mvn/jvm.config` with a line `--add-opens java.base/sun.nio.ch=ALL-UNNAMED`
   > 
   > Do you recall why you indicate some requirements on jdk17+ ? The tests seem to run fine without this flag, so I'm not sure it's actually required. On a side note, I just noticed the build is broken on jdk >= 20, not sure why... but that seems completely unrelated to this issue.
   
   it's not requirement on jdk17. I'm just saying with 1.0.1 using `<hashAlgorithm>XXMM</hashAlgorithm>` and jdk 17 without this flag give me this error:
   ```
   Exception in thread "main" java.lang.IllegalAccessError: class net.openhft.hashing.LongHashFunction (in unnamed module @0x3e3861d7) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x3e3861d7
   	at net.openhft.hashing.LongHashFunction.hashByteBuffer(LongHashFunction.java:536)
   	at net.openhft.hashing.LongHashFunction.hashBytes(LongHashFunction.java:530)
   	at org.apache.maven.buildcache.hash.XX$Checksum.digest(XX.java:83)
   	at org.apache.maven.buildcache.hash.HashChecksum.digest(HashChecksum.java:58)
   
   ``` 
   ```
   olamy@pop-os:~/dev/sources/jetty/jetty.project$ mvn -v
   Apache Maven 3.9.2 (c9616018c7a021c1c39be70fb2843d6f5f9b8a1c)
   Maven home: /home/olamy/.sdkman/candidates/maven/current
   Java version: 17.0.7, vendor: Eclipse Adoptium, runtime: /home/olamy/.sdkman/candidates/java/17.0.7-tem
   Default locale: en_AU, platform encoding: UTF-8
   OS name: "linux", version: "6.2.6-76060206-generic", arch: "amd64", family: "unix"
   ```
   I'm using https://github.com/eclipse/jetty.project.git branch `jetty-12.0.x-build-cache` just remove the last entry from `.mvn/jvm.config`   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [maven-build-cache-extension] qweek commented on pull request #58: Add METRO hash implementation, change the default, fix the web site

Posted by "qweek (via GitHub)" <gi...@apache.org>.
qweek commented on PR #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58#issuecomment-1501545211

   Hi, gnodet
   
   I don't mind adding a new hashing algorithm, but wouldn't make it the default.
   Last time original MetroHash repository was updated 5 years ago and is currently unmaintained.
   https://github.com/jandrewrogers/MetroHash
   Also, according to most known hash comparison benchmark MetroHash have same quality problems
   https://github.com/rurban/smhasher
   metrohash64_2: UB, LongNeighbors
   https://github.com/rurban/smhasher/blob/master/doc/metrohash64_2.txt
   We chose XX as most known, mature and fast option (with quality score 10)
   https://github.com/Cyan4973/xxHash
   
   Hash speed is highly dependent on operating system, as far as I remember in our tests on Linux, Memory Mapped version was faster, and on Windows it was slower (that's why we use both). It may also depend on the file system, CPU, etc
   Unfortunately, Zero-Allocation-Hashing repository does not contain performance benchmark for Metro
   https://github.com/OpenHFT/Zero-Allocation-Hashing/issues/28
   Smhasher benchmark is not relevant because it use C++ versions, but it says
   "Fastest hash functions on x86_64 without quality problems are:
   xxh3low
   wyhash
   ahash64
   t1ha2_atonce
   komihash
   FarmHash (not portable, too machine specific: 64 vs 32bit, old gcc, ...)
   halftime_hash128
   Spooky32
   pengyhash
   nmhash32
   mx3
   MUM/mir (different results on 32/64-bit archs, lots of bad seeds to filter out)
   fasthash32"
   P.S. it seemed to me that we tested version with XX3, but I'm not sure that it got into final release


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [maven-build-cache-extension] olamy commented on pull request #58: Add METRO hash implementation, change the default, fix the web site

Posted by "olamy (via GitHub)" <gi...@apache.org>.
olamy commented on PR #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58#issuecomment-1524841734

   maybe would be good to add a comment regarding usage of `*-MM` implementations which need some extra setup with jdk17+
   In top directory create a file `.mvn/jvm.config` with a line `--add-opens java.base/sun.nio.ch=ALL-UNNAMED`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [maven-build-cache-extension] gnodet commented on pull request #58: Add METRO hash implementation, change the default, fix the web site

Posted by "gnodet (via GitHub)" <gi...@apache.org>.
gnodet commented on PR #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58#issuecomment-1503361438

   It would be nice to provide some guidelines for choosing an algorithm. The fact they are missing is the original driver for this PR...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [maven-build-cache-extension] gnodet commented on pull request #58: Add METRO hash implementation, change the default, fix the web site

Posted by "gnodet (via GitHub)" <gi...@apache.org>.
gnodet commented on PR #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58#issuecomment-1588992345

   @olamy 
   
   > maybe would be good to add a comment regarding usage of `*-MM` implementations which need some extra setup with jdk17+
   > In top directory create a file `.mvn/jvm.config` with a line `--add-opens java.base/sun.nio.ch=ALL-UNNAMED`
   
   Do you recall why you indicate some requirements on jdk17+ ? The tests seem to run fine without this flag, so I'm not sure it's actually required.
   On a side note, I just noticed the build is broken on jdk >= 20, not sure why... but that seems completely unrelated to this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [maven-build-cache-extension] gnodet merged pull request #58: [MCLEAN-109] Add METRO hash implementation

Posted by "gnodet (via GitHub)" <gi...@apache.org>.
gnodet merged PR #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org