You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/03/02 19:33:00 UTC

[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements

     [ https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=396337&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396337 ]

ASF GitHub Bot logged work on IO-649:
-------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Mar/20 19:32
            Start Date: 02/Mar/20 19:32
    Worklog Time Spent: 10m 
      Work Description: garydgregory commented on issue #101: IO-649 - Improve the performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#issuecomment-593576953
 
 
   Sorry about the delay, last day of vacation here ;-)
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 396337)
    Time Spent: 5.5h  (was: 5h 20m)

> IOUtils contentEquals method performance improvements
> -----------------------------------------------------
>
>                 Key: IO-649
>                 URL: https://issues.apache.org/jira/browse/IO-649
>             Project: Commons IO
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 1.0, 1.1
>            Reporter: Brett Lounsbury
>            Priority: Major
>             Fix For: 2.6
>
>          Time Spent: 5.5h
>  Remaining Estimate: 0h
>
>  
> contentEquals() internally wraps any given InputStream/Reader in a Buffered version (if it is not already buffered) which avoids a lot of IO penalties, but then it proceeds to read each byte/character one at a time.  This leads to significantly more method calls and also a lot of byte -> int casting since the read() method returns an int between 0 and 255 instead of returning a byte.
>  
> I have a change that modifies the contentEquals() methods to internally buffer content into a byte/char array and to then do batch comparisons of those arrays using Arrays.equals instead of using a BufferedInputStream or BufferedReader and making use of the single byte/char read() methods.  This reduces the number of method invocations by a factor equal to the buffer size and avoids casting every byte read to an int.
>  
> The following table shows the performance increase over 1000 iterations of comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there was a forced System.gc() between each iteration to avoid GC as a source of latency:
> Average: 7236 to 858ms (8.43x speedup)
> P50: 7224 to 856ms (8.44x speedup)
> P90: 7249 to 860ms (8.43x speedup)
> P99: 7410 to 913ms (8.12x speedup)
> P100: 8330 to 1278ms (6.52x speedup)
>  
> The following table shows the performance increase over 1000 iterations of comparing 2 1GB Reader of character data (stored in memory to avoid I/O). This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there was a forced System.gc() between each iteration to avoid GC as a source of latency:
> Average: 11281 to 1737ms (6.50x speedup)
> P50: 11262 to 1735ms (6.49x speedup)
> P90: 11292 to 1741ms (6.49x speedup)
> P99: 11707 to 1774ms (6.60x speedup)
> P100: 12176 to 1884ms (6.46x speedup)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)