You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "nfsantos (via GitHub)" <gi...@apache.org> on 2023/04/03 15:57:16 UTC

[GitHub] [jackrabbit-oak] nfsantos commented on a diff in pull request #886: OAK-10171: datastore-copy cmd: add checksum validation

nfsantos commented on code in PR #886:
URL: https://github.com/apache/jackrabbit-oak/pull/886#discussion_r1156156230


##########
oak-run/src/main/java/org/apache/jackrabbit/oak/run/Downloader.java:
##########
@@ -147,14 +163,49 @@ public ItemResponse call() throws Exception {
             sourceUrl.setConnectTimeout(Downloader.this.connectTimeoutMs);
             sourceUrl.setReadTimeout(Downloader.this.readTimeoutMs);
 
+            // Updating a MessageDigest from multiple threads is not thread safe, so we cannot reuse a single instance.
+            // Creating a new instance is a lightweight operation, no need to increase complexity by creating a pool.
+            MessageDigest md = null;
+            if (Downloader.this.checksumAlgorithm != null && item.checksum != null) {
+                md = MessageDigest.getInstance(Downloader.this.checksumAlgorithm);
+            }
+
             Path destinationPath = Paths.get(item.destination);
             Files.createDirectories(destinationPath.getParent());
-            long size;
+            long size = 0;
             try (ReadableByteChannel byteChannel = Channels.newChannel(sourceUrl.getInputStream());
                  FileOutputStream outputStream = new FileOutputStream(destinationPath.toFile())) {
-                size = outputStream.getChannel()
-                        .transferFrom(byteChannel, 0, Long.MAX_VALUE);
+                ByteBuffer buffer = ByteBuffer.allocate(1024);
+                int bytesRead;
+                while ((bytesRead = byteChannel.read(buffer)) != -1) {
+                    buffer.flip();
+                    if (md != null) {
+                        md.update(buffer);
+                    }
+                    size += bytesRead;
+                    outputStream.getChannel().write(buffer);
+                    buffer.clear();
+                }

Review Comment:
   Could you explain better what the results mean:
   
   1.52-SNAPSHOT (8kb) | 58.71267322 | 357
   -- | -- | --
   1.52-SNAPSHOT (8kb, SHA-256) | 54.7603119 | 383
   
   The version without SHA-256, does it do any checksumming? And if it does not do any checksumming, why is it slower? 
   
   And 1.46 is the current version that uses `transferTo`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org