You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@maven.apache.org by GitBox <gi...@apache.org> on 2020/05/03 01:07:06 UTC

[GitHub] [maven-shared-utils] roxspring opened a new pull request #28: [MSHARED-884] - Don't always overwrite filtered resources

roxspring opened a new pull request #28:
URL: https://github.com/apache/maven-shared-utils/pull/28


   - Added unit tests for `FileUtils.copyFile()` methods using `FilterWrapper`
   - When filtering and _potentially_ overwriting an existing file:
     - Open the existing file as a `RandomAccessFile`
     - Encode each buffer of content
     - Compare encoded buffer with a matching buffer from the existing file
     - If different then overwrite the remainder of the file  
   - Added unit tests to demonstrate that lastModified date is only updated when the content changes.
   - Rudimentary performance tests with various source sizes show no significant impact:
   
     Source Size | Avg Millis (Old) | Avg Millis (New)
     -- | -- | --
     100 MB | 6034 | 5940
     1 MB | 54 | 56
     10 KB | 1 | 8
   
   Relates to MSHARED-884


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [maven-shared-utils] roxspring commented on pull request #28: [MSHARED-884] - Don't always overwrite filtered resources

Posted by GitBox <gi...@apache.org>.
roxspring commented on pull request #28:
URL: https://github.com/apache/maven-shared-utils/pull/28#issuecomment-629534673


   @michael-o I think I've fixed the Buffer code as you requested - I'd be grateful of another review!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [maven-shared-utils] michael-o edited a comment on pull request #28: [MSHARED-884] - Don't always overwrite filtered resources

Posted by GitBox <gi...@apache.org>.
michael-o edited a comment on pull request #28:
URL: https://github.com/apache/maven-shared-utils/pull/28#issuecomment-634305327


   > 
   > 
   > > Does it actually make sense to introduce a threshold? E.g., > 1 MB?
   > 
   > You're suggesting that we apply the new logic only to input >1MB and for smaller files we presumably process them in memory and compare that to the existing written file, right?
   > 
   > In general I dislike maintaining multiple equivalent codepaths and switching between them conditionally as typically one is used far more often and the others languish in bit rot, becoming a maintenance burden. Sometimes this approach is necessary to gain performance in different circumstances, but is that clearly the case here?
   > 
   > I'm not sure it is. For a start I don't remember what rounding errors might have been involved in the 1 vs 8 millisecond comparison from the PR description, but more importantly that's comparing broken to fixed behaviour - the 1 millisecond figure timing is for the existing "unconditionally overwrite when filtering", and we don't yet have comparable numbers for "process them in memory and compare that to the existing written file".
   > 
   > If an additional 7 milliseconds per filtered resource is enough to cause panic then I guess I can look into a thresholded approach but my inclination is to avoid that effort until it proves necessary.
   
   Well reasoned, let's leave it as-is. I will proceed with the review Friday. Please ping me by then!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [maven-shared-utils] michael-o commented on a change in pull request #28: [MSHARED-884] - Don't always overwrite filtered resources

Posted by GitBox <gi...@apache.org>.
michael-o commented on a change in pull request #28:
URL: https://github.com/apache/maven-shared-utils/pull/28#discussion_r430702922



##########
File path: src/test/java/org/apache/maven/shared/utils/io/FileUtilsTest.java
##########
@@ -441,9 +449,300 @@ public void copyFile1()
             testFile1.lastModified() == destination.lastModified());*/
     }
 
+    private static long MILLIS_PER_DAY = 24 * 60 * 60 * 1000L;

Review comment:
       Why not use TimeUnit for this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [maven-shared-utils] roxspring commented on pull request #28: [MSHARED-884] - Don't always overwrite filtered resources

Posted by GitBox <gi...@apache.org>.
roxspring commented on pull request #28:
URL: https://github.com/apache/maven-shared-utils/pull/28#issuecomment-623106474


   Yes. That’s the point, as discussed in the jira issue and dev mailing lists. Unnecessarily  modifying the file when the filter content hasn’t changed can lead to wasted work downstream. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [maven-shared-utils] michael-o commented on pull request #28: [MSHARED-884] - Don't always overwrite filtered resources

Posted by GitBox <gi...@apache.org>.
michael-o commented on pull request #28:
URL: https://github.com/apache/maven-shared-utils/pull/28#issuecomment-623105263


   What is the general benefit you expect here? Not changing last modified date?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [maven-shared-utils] olamy commented on pull request #28: [MSHARED-884] - Don't always overwrite filtered resources

Posted by GitBox <gi...@apache.org>.
olamy commented on pull request #28:
URL: https://github.com/apache/maven-shared-utils/pull/28#issuecomment-634374708


   merged via e57180309a64758e161513d80f1185e13dc84826


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [maven-shared-utils] roxspring commented on a change in pull request #28: [MSHARED-884] - Don't always overwrite filtered resources

Posted by GitBox <gi...@apache.org>.
roxspring commented on a change in pull request #28:
URL: https://github.com/apache/maven-shared-utils/pull/28#discussion_r430700704



##########
File path: src/main/java/org/apache/maven/shared/utils/io/FileUtils.java
##########
@@ -1810,34 +1816,101 @@ public static void copyFile( @Nonnull File from, @Nonnull File to, @Nullable Str
                                  @Nullable FilterWrapper[] wrappers, boolean overwrite )
         throws IOException
     {
-        if ( wrappers != null && wrappers.length > 0 )
+        if ( wrappers == null || wrappers.length == 0 )
         {
-            
-            if ( encoding == null || encoding.isEmpty() ) 
+            if ( overwrite || to.lastModified() < from.lastModified() )
+            {
+                copyFile( from, to );
+            }
+        }
+        else
+        {
+            if ( encoding == null || encoding.isEmpty() )
             {
                 encoding = Charset.defaultCharset().name();
             }
-            
+
             // buffer so it isn't reading a byte at a time!
             try ( Reader fileReader =
-                new BufferedReader( new InputStreamReader( new FileInputStream( from ), encoding ) );
-                            Writer fileWriter = new OutputStreamWriter( new FileOutputStream( to ), encoding ) )
+                    new BufferedReader( new InputStreamReader( new FileInputStream( from ), encoding ) ) )

Review comment:
       Done.

##########
File path: src/test/java/org/apache/maven/shared/utils/io/FileUtilsTest.java
##########
@@ -441,9 +449,300 @@ public void copyFile1()
             testFile1.lastModified() == destination.lastModified());*/
     }
 
+    private static long MILLIS_PER_DAY = 24 * 60 * 60 * 1000L;

Review comment:
       Happy to use `TimeUnit.DAYS.toMillis(1)` instead. Assumed `ChronoUnit` and friends weren't available but had forgotten `TimeUnit`!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [maven-shared-utils] roxspring commented on pull request #28: [MSHARED-884] - Don't always overwrite filtered resources

Posted by GitBox <gi...@apache.org>.
roxspring commented on pull request #28:
URL: https://github.com/apache/maven-shared-utils/pull/28#issuecomment-634302632


   > Does it actually make sense to introduce a threshold? E.g., > 1 MB?
   
   You're suggesting that we apply the new logic only to input >1MB and for smaller files we presumably process them in memory and compare that to the existing written file, right?
   
   In general I dislike maintaining multiple equivalent codepaths and switching between them conditionally as typically one is used far more often and the others languish in bit rot, becoming a maintenance burden. Sometimes this approach is necessary to gain performance in different circumstances, but is that clearly the case here?
   
   I'm not sure it is. For a start I don't remember what rounding errors might have been involved in the 1 vs 8 millisecond comparison from the PR description, but more importantly that's comparing broken to fixed behaviour - the 1 millisecond figure timing is for the existing "unconditionally overwrite when filtering", and we don't yet have comparable numbers for "process them in memory and compare that to the existing written file".
   
   If an additional 7 milliseconds per filtered resource is enough to cause panic then I guess I can look into a thresholded approach but my inclination is to avoid that effort until it proves necessary.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [maven-shared-utils] michael-o commented on pull request #28: [MSHARED-884] - Don't always overwrite filtered resources

Posted by GitBox <gi...@apache.org>.
michael-o commented on pull request #28:
URL: https://github.com/apache/maven-shared-utils/pull/28#issuecomment-634305327


   > 
   > 
   > > Does it actually make sense to introduce a threshold? E.g., > 1 MB?
   > 
   > You're suggesting that we apply the new logic only to input >1MB and for smaller files we presumably process them in memory and compare that to the existing written file, right?
   > 
   > In general I dislike maintaining multiple equivalent codepaths and switching between them conditionally as typically one is used far more often and the others languish in bit rot, becoming a maintenance burden. Sometimes this approach is necessary to gain performance in different circumstances, but is that clearly the case here?
   > 
   > I'm not sure it is. For a start I don't remember what rounding errors might have been involved in the 1 vs 8 millisecond comparison from the PR description, but more importantly that's comparing broken to fixed behaviour - the 1 millisecond figure timing is for the existing "unconditionally overwrite when filtering", and we don't yet have comparable numbers for "process them in memory and compare that to the existing written file".
   > 
   > If an additional 7 milliseconds per filtered resource is enough to cause panic then I guess I can look into a thresholded approach but my inclination is to avoid that effort until it proves necessary.
   
   Well reasoned, let's leave it as-is.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [maven-shared-utils] olamy commented on a change in pull request #28: [MSHARED-884] - Don't always overwrite filtered resources

Posted by GitBox <gi...@apache.org>.
olamy commented on a change in pull request #28:
URL: https://github.com/apache/maven-shared-utils/pull/28#discussion_r430111873



##########
File path: src/main/java/org/apache/maven/shared/utils/io/FileUtils.java
##########
@@ -1810,34 +1816,101 @@ public static void copyFile( @Nonnull File from, @Nonnull File to, @Nullable Str
                                  @Nullable FilterWrapper[] wrappers, boolean overwrite )
         throws IOException
     {
-        if ( wrappers != null && wrappers.length > 0 )
+        if ( wrappers == null || wrappers.length == 0 )
         {
-            
-            if ( encoding == null || encoding.isEmpty() ) 
+            if ( overwrite || to.lastModified() < from.lastModified() )
+            {
+                copyFile( from, to );
+            }
+        }
+        else
+        {
+            if ( encoding == null || encoding.isEmpty() )
             {
                 encoding = Charset.defaultCharset().name();
             }
-            
+
             // buffer so it isn't reading a byte at a time!
             try ( Reader fileReader =
-                new BufferedReader( new InputStreamReader( new FileInputStream( from ), encoding ) );
-                            Writer fileWriter = new OutputStreamWriter( new FileOutputStream( to ), encoding ) )
+                    new BufferedReader( new InputStreamReader( new FileInputStream( from ), encoding ) ) )
             {
-
                 Reader wrapped = fileReader;
                 for ( FilterWrapper wrapper : wrappers )
                 {
                     wrapped = wrapper.getReader( wrapped );
                 }
 
-                IOUtil.copy( wrapped, fileWriter );
-            }
-        }
-        else
-        {
-            if ( to.lastModified() < from.lastModified() || overwrite )
-            {
-                copyFile( from, to );
+                if ( overwrite || !to.exists() )
+                {
+                    try ( Writer fileWriter = new OutputStreamWriter( new FileOutputStream( to ), encoding ) )

Review comment:
       what about `Files.newBufferedWriter(  )`

##########
File path: src/main/java/org/apache/maven/shared/utils/io/FileUtils.java
##########
@@ -1810,34 +1816,101 @@ public static void copyFile( @Nonnull File from, @Nonnull File to, @Nullable Str
                                  @Nullable FilterWrapper[] wrappers, boolean overwrite )
         throws IOException
     {
-        if ( wrappers != null && wrappers.length > 0 )
+        if ( wrappers == null || wrappers.length == 0 )
         {
-            
-            if ( encoding == null || encoding.isEmpty() ) 
+            if ( overwrite || to.lastModified() < from.lastModified() )
+            {
+                copyFile( from, to );
+            }
+        }
+        else
+        {
+            if ( encoding == null || encoding.isEmpty() )
             {
                 encoding = Charset.defaultCharset().name();
             }
-            
+
             // buffer so it isn't reading a byte at a time!
             try ( Reader fileReader =
-                new BufferedReader( new InputStreamReader( new FileInputStream( from ), encoding ) );
-                            Writer fileWriter = new OutputStreamWriter( new FileOutputStream( to ), encoding ) )
+                    new BufferedReader( new InputStreamReader( new FileInputStream( from ), encoding ) ) )

Review comment:
       what about `Files.newBufferedReader(  )`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [maven-shared-utils] michael-o commented on pull request #28: [MSHARED-884] - Don't always overwrite filtered resources

Posted by GitBox <gi...@apache.org>.
michael-o commented on pull request #28:
URL: https://github.com/apache/maven-shared-utils/pull/28#issuecomment-623119518


   Please change the buffer code to: https://github.com/apache/maven-wagon/commit/92c0d2a858a38d9b3d0aefc0f9298d5c31c7e508


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [maven-shared-utils] roxspring commented on a change in pull request #28: [MSHARED-884] - Don't always overwrite filtered resources

Posted by GitBox <gi...@apache.org>.
roxspring commented on a change in pull request #28:
URL: https://github.com/apache/maven-shared-utils/pull/28#discussion_r419030042



##########
File path: src/main/java/org/apache/maven/shared/utils/io/FileUtils.java
##########
@@ -1826,34 +1831,101 @@ public static void copyFile( @Nonnull File from, @Nonnull File to, @Nullable Str
                                  @Nullable FilterWrapper[] wrappers, boolean overwrite )
         throws IOException
     {
-        if ( wrappers != null && wrappers.length > 0 )
+        if ( wrappers == null || wrappers.length <= 0 )

Review comment:
       Flipped the condition around to get the simple case out of the way and ease the cognitive load when reading the else clause. Maybe I should split this out into a separate commit and make that final check `==` since negative lengths shouldn't be possible!

##########
File path: src/test/java/org/apache/maven/shared/utils/io/FileUtilsTest.java
##########
@@ -1,29 +1,49 @@
 package org.apache.maven.shared.utils.io;
 
-import static org.hamcrest.CoreMatchers.containsString;
-import static org.hamcrest.CoreMatchers.hasItems;
-import static org.hamcrest.CoreMatchers.is;
-import static org.hamcrest.CoreMatchers.not;
-import static org.hamcrest.MatcherAssert.assertThat;
-import static org.junit.Assert.assertFalse;
-import static org.junit.Assert.assertTrue;
-import static org.junit.Assert.fail;
-import static org.junit.Assume.assumeFalse;
-import static org.junit.Assume.assumeThat;
-import static org.junit.Assume.assumeTrue;
+import org.apache.maven.shared.utils.Os;
+import org.apache.maven.shared.utils.testhelpers.FileTestHelper;
+import org.codehaus.plexus.util.InterpolationFilterReader;
+import org.hamcrest.CoreMatchers;
+import org.junit.Before;
+import org.junit.Ignore;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.rules.TestName;
 
+import javax.annotation.Nonnull;
 import java.io.BufferedOutputStream;
 import java.io.File;
 import java.io.FileInputStream;
 import java.io.FileNotFoundException;
 import java.io.FileOutputStream;
+import java.io.FileReader;
+import java.io.FileWriter;
 import java.io.IOException;
 import java.io.InputStream;
 import java.io.OutputStream;
+import java.io.Reader;
+import java.io.Writer;
 import java.net.URL;
 import java.util.Arrays;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.List;
+import java.util.Map;
+
+import static org.hamcrest.CoreMatchers.containsString;
+import static org.hamcrest.CoreMatchers.hasItems;
+import static org.hamcrest.CoreMatchers.is;
+import static org.hamcrest.CoreMatchers.not;
+import static org.hamcrest.MatcherAssert.assertThat;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertNotEquals;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+import static org.junit.Assume.assumeFalse;
+import static org.junit.Assume.assumeThat;
+import static org.junit.Assume.assumeTrue;

Review comment:
       Looks like Intellij moved some imports when I wasn't looking. Happy to fix if it's an issue.

##########
File path: src/test/java/org/apache/maven/shared/utils/io/FileUtilsTest.java
##########
@@ -438,9 +448,300 @@ public void copyFile1()
             testFile1.lastModified() == destination.lastModified());*/
     }
 
+    private static long MILLIS_PER_DAY = 24 * 60 * 60 * 1000L;
+
+    /** A time today, rounded down to the previous minute */
+    private static long MODIFIED_TODAY = (System.currentTimeMillis() / 60_000) * 60_000;

Review comment:
       Throwing away seconds here allows these precise times to be used accurately even on low resolution file systems such as FAT32.

##########
File path: src/main/java/org/apache/maven/shared/utils/io/FileUtils.java
##########
@@ -1826,34 +1831,101 @@ public static void copyFile( @Nonnull File from, @Nonnull File to, @Nullable Str
                                  @Nullable FilterWrapper[] wrappers, boolean overwrite )
         throws IOException
     {
-        if ( wrappers != null && wrappers.length > 0 )
+        if ( wrappers == null || wrappers.length <= 0 )
         {
-            
-            if ( encoding == null || encoding.isEmpty() ) 
+            if ( to.lastModified() < from.lastModified() || overwrite )
+            {
+                copyFile( from, to );
+            }
+        }
+        else
+        {
+            if ( encoding == null || encoding.isEmpty() )
             {
                 encoding = Charset.defaultCharset().name();
             }
-            
+
             // buffer so it isn't reading a byte at a time!
             try ( Reader fileReader =
-                new BufferedReader( new InputStreamReader( new FileInputStream( from ), encoding ) );
-                            Writer fileWriter = new OutputStreamWriter( new FileOutputStream( to ), encoding ) )
+                          new BufferedReader( new InputStreamReader( new FileInputStream( from ), encoding ) ) )
             {
-
                 Reader wrapped = fileReader;
                 for ( FilterWrapper wrapper : wrappers )
                 {
                     wrapped = wrapper.getReader( wrapped );
                 }
 
-                IOUtil.copy( wrapped, fileWriter );
-            }
-        }
-        else
-        {
-            if ( to.lastModified() < from.lastModified() || overwrite )
-            {
-                copyFile( from, to );
+                if ( overwrite || !to.exists() )
+                {
+                    try ( Writer fileWriter = new OutputStreamWriter( new FileOutputStream( to ), encoding ) )
+                    {
+                        IOUtil.copy( wrapped, fileWriter );
+                    }
+                }
+                else
+                {
+                    CharsetEncoder encoder = Charset.forName( encoding ).newEncoder();
+
+                    int totalBufferSize = FILE_COPY_BUFFER_SIZE;
+
+                    int charBufferSize = ( int ) Math.floor( totalBufferSize / ( 2 + 2 * encoder.maxBytesPerChar() ) );
+                    int byteBufferSize = ( int ) Math.ceil( charBufferSize * encoder.maxBytesPerChar() );

Review comment:
       This is attempting to consume the same amount of buffer memory in total, but split between a `char` buffer and a pair of `byte` buffers - each big enough to contain the encoded `char` buffer.

##########
File path: src/main/java/org/apache/maven/shared/utils/io/FileUtils.java
##########
@@ -109,7 +114,7 @@ protected FileUtils()
     /**
      * The file copy buffer size (30 MB)
      */
-    private static final long FILE_COPY_BUFFER_SIZE = ONE_MB * 30;
+    private static final int FILE_COPY_BUFFER_SIZE = ONE_MB * 30;

Review comment:
       Don't think there was any need for this to be a `long` other than the only previous use case, where it's now automatically cast to a long anyway.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [maven-shared-utils] olamy closed pull request #28: [MSHARED-884] - Don't always overwrite filtered resources

Posted by GitBox <gi...@apache.org>.
olamy closed pull request #28:
URL: https://github.com/apache/maven-shared-utils/pull/28


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org