You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@maven.apache.org by "gnodet (via GitHub)" <gi...@apache.org> on 2023/04/05 09:45:35 UTC
[GitHub] [maven-build-cache-extension] gnodet opened a new pull request, #58: Add METRO hash implementation, change the default, fix the web site
gnodet opened a new pull request, #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58
This PR provides new METRO / METRO_MM hash algorithms based on Zero-Allocation-Hashing Metro hash implementation.
Given it's the fastest, the default is changed to it and the website modified accordingly.
This is not very clear when/if people should change.
Also, I would have assumed the `MM` versions of the hashes provide better performances, but the added perf test does not seem to indicate the
same:
```
Benchmark Mode Cnt Score Error Units
PerfTest.METRO thrpt 3 0.443 ± 0.051 ops/ms
PerfTest.METRO_MM thrpt 3 0.354 ± 0.176 ops/ms
PerfTest.SHA1 thrpt 3 0.193 ± 0.058 ops/ms
PerfTest.SHA256 thrpt 3 0.144 ± 0.056 ops/ms
PerfTest.XX thrpt 3 0.405 ± 0.242 ops/ms
PerfTest.XXMM thrpt 3 0.375 ± 0.170 ops/ms
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [maven-build-cache-extension] qweek commented on pull request #58: Add METRO hash implementation, change the default, fix the web site
Posted by "qweek (via GitHub)" <gi...@apache.org>.
qweek commented on PR #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58#issuecomment-1503243437
I have a couple more comments to your benchmark.
1. Throughput benchmark does not show you real picture, because it depends on IO performance and total load on the disk. This means that results will most likely contain outliers, and averaged throughput will show the wrong value. It is better in this case to use Mode.SampleTime with percentiles.
2. Benchmark results also depends on file size, disk cache, etc. Your benchmark includes have small test suite of 149 java files with total size less than 1 MB.
I ran similar benchmark on Linux with random generated files 128B .. 128KB, 16 files for each size (total size ~4 MB).
```
Benchmark Mode Cnt Score Error Units
PerfTest.XX sample 3232 2.783 ± 0.016 ms/op
PerfTest.XX:XX·p0.00 sample 2.49 ms/op
PerfTest.XX:XX·p0.50 sample 2.793 ms/op
PerfTest.XX:XX·p0.90 sample 2.99 ms/op
PerfTest.XX:XX·p0.95 sample 3.127 ms/op
PerfTest.XX:XX·p0.99 sample 3.502 ms/op
PerfTest.XX:XX·p0.999 sample 5.278 ms/op
PerfTest.XX:XX·p0.9999 sample 5.833 ms/op
PerfTest.XX:XX·p1.00 sample 5.833 ms/op
Benchmark Mode Cnt Score Error Units
PerfTest.XXMM sample 2701 3.332 ± 0.014 ms/op
PerfTest.XXMM:XXMM·p0.00 sample 3.211 ms/op
PerfTest.XXMM:XXMM·p0.50 sample 3.297 ms/op
PerfTest.XXMM:XXMM·p0.90 sample 3.457 ms/op
PerfTest.XXMM:XXMM·p0.95 sample 3.502 ms/op
PerfTest.XXMM:XXMM·p0.99 sample 3.715 ms/op
PerfTest.XXMM:XXMM·p0.999 sample 6.555 ms/op
PerfTest.XXMM:XXMM·p0.9999 sample 11.682 ms/op
PerfTest.XXMM:XXMM·p1.00 sample 11.682 ms/op
```
In this benchmark XX with Memory Mapped file was **6 - 29% slower**
With random generated files 128KB .. 128MB, 16 files for each size (total size ~4 GB, which is more realistic suite in our case)
```
Benchmark Mode Cnt Score Error Units
PerfTest.XX sample 3 3836.39 ± 309.251 ms/op
PerfTest.XX:XX·p0.00 sample 3821.011 ms/op
PerfTest.XX:XX·p0.50 sample 3833.594 ms/op
PerfTest.XX:XX·p0.90 sample 3854.565 ms/op
PerfTest.XX:XX·p0.95 sample 3854.565 ms/op
PerfTest.XX:XX·p0.99 sample 3854.565 ms/op
PerfTest.XX:XX·p0.999 sample 3854.565 ms/op
PerfTest.XX:XX·p0.9999 sample 3854.565 ms/op
PerfTest.XX:XX·p1.00 sample 3854.565 ms/op
Benchmark Mode Cnt Score Error Units
PerfTest.XXMM sample 7 1523.132 ± 233.943 ms/op
PerfTest.XXMM:XXMM·p0.00 sample 1413.48 ms/op
PerfTest.XXMM:XXMM·p0.50 sample 1533.018 ms/op
PerfTest.XXMM:XXMM·p0.90 sample 1663.042 ms/op
PerfTest.XXMM:XXMM·p0.95 sample 1663.042 ms/op
PerfTest.XXMM:XXMM·p0.99 sample 1663.042 ms/op
PerfTest.XXMM:XXMM·p0.999 sample 1663.042 ms/op
PerfTest.XXMM:XXMM·p0.9999 sample 1663.042 ms/op
PerfTest.XXMM:XXMM·p1.00 sample 1663.042 ms/op
```
In this benchmark XX with Memory Mapped file was **132 - 170% faster**
You can adjust benchmark according to your load, but main problem is that test run takes a lot of time
```
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.security.SecureRandom;
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
public static class HashState {
private static final String USER_DIRECTORY = System.getProperty("user.dir");
private static List<Path> paths;
@Setup(Level.Iteration)
public void setup() {
paths = new ArrayList<>();
startFileSize = Integer.parseInt(System.getProperty("from", "128"));
endFileSize = Integer.parseInt(System.getProperty("to", "131072"));
for (int fileSize = startFileSize; fileSize <= endFileSize; fileSize *= 2) {
for (int i = 0; i < 16; i++) {
String fileName = fileSize + "B_" + UUID.randomUUID() + ".bin";
Path filePath = Paths.get(USER_DIRECTORY, "target", fileName);
generateRandomBinaryFile(filePath, fileSize);
paths.add(filePath);
}
}
}
private static void generateRandomBinaryFile(Path path, int fileSize) {
SecureRandom random = new SecureRandom();
byte[] buffer = new byte[fileSize];
random.nextBytes(buffer);
try {
Files.createDirectories(path.getParent());
Files.write(path, buffer);
System.out.println("Random binary file generated successfully: " + path);
} catch (IOException e) {
System.err.println("Error while generating random binary file: " + e.getMessage());
}
}
@TearDown(Level.Iteration)
public void deleteAllPaths() {
for (Path path : paths) {
try {
Files.delete(path);
System.out.println("Deleted file: " + path);
} catch (IOException e) {
System.err.println("Error while deleting file: " + path + ", " + e.getMessage());
}
}
}
}
```
code generated by GPT-4 ©
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [maven-build-cache-extension] olamy commented on pull request #58: Add METRO hash implementation, change the default, fix the web site
Posted by "olamy (via GitHub)" <gi...@apache.org>.
olamy commented on PR #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58#issuecomment-1539159212
@gnodet happy to merge it?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [maven-build-cache-extension] bmarwell commented on a diff in pull request #58: Add METRO hash implementation, change the default, fix the web site
Posted by "bmarwell (via GitHub)" <gi...@apache.org>.
bmarwell commented on code in PR #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58#discussion_r1227069806
##########
src/main/java/org/apache/maven/buildcache/hash/Zah.java:
##########
@@ -20,43 +20,55 @@
import java.io.IOException;
import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
import java.nio.file.Files;
import java.nio.file.Path;
import net.openhft.hashing.LongHashFunction;
+import static java.nio.channels.FileChannel.MapMode.READ_ONLY;
+import static java.nio.file.StandardOpenOption.READ;
+
/**
- * XX
+ * Zero-Allocation-Hash based factory
*/
-public class XX implements Hash.Factory {
+public class Zah implements Hash.Factory {
+
+ private final String name;
+ private final LongHashFunction hash;
+ private final boolean useMemoryMappedBuffers;
- static final LongHashFunction INSTANCE = LongHashFunction.xx();
+ public Zah(String name, LongHashFunction hash, boolean useMemoryMappedBuffers) {
Review Comment:
-1 on the boolean parameter. What if you need to distinguish further? Better use a String or Enum.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [maven-build-cache-extension] qweek commented on pull request #58: Add METRO hash implementation, change the default, fix the web site
Posted by "qweek (via GitHub)" <gi...@apache.org>.
qweek commented on PR #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58#issuecomment-1503420246
XX algorithm is enough for most cases.
MM optimized version may be useful for big projects with lots of module, files and dependencies.
SHA is only needed if you stick to the classic time-tested and NSA verified algorithms (it always loses in speed)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [maven-build-cache-extension] olamy commented on pull request #58: Add METRO hash implementation, change the default, fix the web site
Posted by "olamy (via GitHub)" <gi...@apache.org>.
olamy commented on PR #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58#issuecomment-1589018110
> @olamy
>
> > maybe would be good to add a comment regarding usage of `*-MM` implementations which need some extra setup with jdk17+
> > In top directory create a file `.mvn/jvm.config` with a line `--add-opens java.base/sun.nio.ch=ALL-UNNAMED`
>
> Do you recall why you indicate some requirements on jdk17+ ? The tests seem to run fine without this flag, so I'm not sure it's actually required. On a side note, I just noticed the build is broken on jdk >= 20, not sure why... but that seems completely unrelated to this issue.
it's not requirement on jdk17. I'm just saying with 1.0.1 using `<hashAlgorithm>XXMM</hashAlgorithm>` and jdk 17 without this flag give me this error:
```
Exception in thread "main" java.lang.IllegalAccessError: class net.openhft.hashing.LongHashFunction (in unnamed module @0x3e3861d7) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x3e3861d7
at net.openhft.hashing.LongHashFunction.hashByteBuffer(LongHashFunction.java:536)
at net.openhft.hashing.LongHashFunction.hashBytes(LongHashFunction.java:530)
at org.apache.maven.buildcache.hash.XX$Checksum.digest(XX.java:83)
at org.apache.maven.buildcache.hash.HashChecksum.digest(HashChecksum.java:58)
```
```
olamy@pop-os:~/dev/sources/jetty/jetty.project$ mvn -v
Apache Maven 3.9.2 (c9616018c7a021c1c39be70fb2843d6f5f9b8a1c)
Maven home: /home/olamy/.sdkman/candidates/maven/current
Java version: 17.0.7, vendor: Eclipse Adoptium, runtime: /home/olamy/.sdkman/candidates/java/17.0.7-tem
Default locale: en_AU, platform encoding: UTF-8
OS name: "linux", version: "6.2.6-76060206-generic", arch: "amd64", family: "unix"
```
I'm using https://github.com/eclipse/jetty.project.git branch `jetty-12.0.x-build-cache` just remove the last entry from `.mvn/jvm.config`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [maven-build-cache-extension] qweek commented on pull request #58: Add METRO hash implementation, change the default, fix the web site
Posted by "qweek (via GitHub)" <gi...@apache.org>.
qweek commented on PR #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58#issuecomment-1501545211
Hi, gnodet
I don't mind adding a new hashing algorithm, but wouldn't make it the default.
Last time original MetroHash repository was updated 5 years ago and is currently unmaintained.
https://github.com/jandrewrogers/MetroHash
Also, according to most known hash comparison benchmark MetroHash have same quality problems
https://github.com/rurban/smhasher
metrohash64_2: UB, LongNeighbors
https://github.com/rurban/smhasher/blob/master/doc/metrohash64_2.txt
We chose XX as most known, mature and fast option (with quality score 10)
https://github.com/Cyan4973/xxHash
Hash speed is highly dependent on operating system, as far as I remember in our tests on Linux, Memory Mapped version was faster, and on Windows it was slower (that's why we use both). It may also depend on the file system, CPU, etc
Unfortunately, Zero-Allocation-Hashing repository does not contain performance benchmark for Metro
https://github.com/OpenHFT/Zero-Allocation-Hashing/issues/28
Smhasher benchmark is not relevant because it use C++ versions, but it says
"Fastest hash functions on x86_64 without quality problems are:
xxh3low
wyhash
ahash64
t1ha2_atonce
komihash
FarmHash (not portable, too machine specific: 64 vs 32bit, old gcc, ...)
halftime_hash128
Spooky32
pengyhash
nmhash32
mx3
MUM/mir (different results on 32/64-bit archs, lots of bad seeds to filter out)
fasthash32"
P.S. it seemed to me that we tested version with XX3, but I'm not sure that it got into final release
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [maven-build-cache-extension] olamy commented on pull request #58: Add METRO hash implementation, change the default, fix the web site
Posted by "olamy (via GitHub)" <gi...@apache.org>.
olamy commented on PR #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58#issuecomment-1524841734
maybe would be good to add a comment regarding usage of `*-MM` implementations which need some extra setup with jdk17+
In top directory create a file `.mvn/jvm.config` with a line `--add-opens java.base/sun.nio.ch=ALL-UNNAMED`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [maven-build-cache-extension] gnodet commented on pull request #58: Add METRO hash implementation, change the default, fix the web site
Posted by "gnodet (via GitHub)" <gi...@apache.org>.
gnodet commented on PR #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58#issuecomment-1503361438
It would be nice to provide some guidelines for choosing an algorithm. The fact they are missing is the original driver for this PR...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [maven-build-cache-extension] gnodet commented on pull request #58: Add METRO hash implementation, change the default, fix the web site
Posted by "gnodet (via GitHub)" <gi...@apache.org>.
gnodet commented on PR #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58#issuecomment-1588992345
@olamy
> maybe would be good to add a comment regarding usage of `*-MM` implementations which need some extra setup with jdk17+
> In top directory create a file `.mvn/jvm.config` with a line `--add-opens java.base/sun.nio.ch=ALL-UNNAMED`
Do you recall why you indicate some requirements on jdk17+ ? The tests seem to run fine without this flag, so I'm not sure it's actually required.
On a side note, I just noticed the build is broken on jdk >= 20, not sure why... but that seems completely unrelated to this issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [maven-build-cache-extension] gnodet merged pull request #58: [MCLEAN-109] Add METRO hash implementation
Posted by "gnodet (via GitHub)" <gi...@apache.org>.
gnodet merged PR #58:
URL: https://github.com/apache/maven-build-cache-extension/pull/58
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org