You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Gary D. Gregory (Jira)" <ji...@apache.org> on 2021/02/17 16:50:00 UTC
[jira] [Commented] (IO-649) IOUtils contentEquals method performance improvements

    [ https://issues.apache.org/jira/browse/IO-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285970#comment-17285970 ] 

Gary D. Gregory commented on IO-649:
------------------------------------

Please see git master and close ticket if the current behavior works for you.

 

> IOUtils contentEquals method performance improvements
> -----------------------------------------------------
>
>                 Key: IO-649
>                 URL: https://issues.apache.org/jira/browse/IO-649
>             Project: Commons IO
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 1.0, 1.1
>            Reporter: Brett Lounsbury
>            Priority: Major
>             Fix For: 2.6
>
>          Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
>  
> contentEquals() internally wraps any given InputStream/Reader in a Buffered version (if it is not already buffered) which avoids a lot of IO penalties, but then it proceeds to read each byte/character one at a time.  This leads to significantly more method calls and also a lot of byte -> int casting since the read() method returns an int between 0 and 255 instead of returning a byte.
>  
> I have a change that modifies the contentEquals() methods to internally buffer content into a byte/char array and to then do batch comparisons of those arrays using Arrays.equals instead of using a BufferedInputStream or BufferedReader and making use of the single byte/char read() methods.  This reduces the number of method invocations by a factor equal to the buffer size and avoids casting every byte read to an int.
>  
> The following table shows the performance increase over 1000 iterations of comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there was a forced System.gc() between each iteration to avoid GC as a source of latency:
> Average: 7236 to 858ms (8.43x speedup)
> P50: 7224 to 856ms (8.44x speedup)
> P90: 7249 to 860ms (8.43x speedup)
> P99: 7410 to 913ms (8.12x speedup)
> P100: 8330 to 1278ms (6.52x speedup)
>  
> The following table shows the performance increase over 1000 iterations of comparing 2 1GB Reader of character data (stored in memory to avoid I/O). This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there was a forced System.gc() between each iteration to avoid GC as a source of latency:
> Average: 11281 to 1737ms (6.50x speedup)
> P50: 11262 to 1735ms (6.49x speedup)
> P90: 11292 to 1741ms (6.49x speedup)
> P99: 11707 to 1774ms (6.60x speedup)
> P100: 12176 to 1884ms (6.46x speedup)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)