You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Robert Cooper (Jira)" <ji...@apache.org> on 2021/02/03 12:39:00 UTC

[jira] [Commented] (IO-718) FileUtils.checksumCRC32 and FileUtils.checksum are not thread safe

    [ https://issues.apache.org/jira/browse/IO-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277973#comment-17277973 ] 

Robert Cooper commented on IO-718:
----------------------------------

Note that I did not spot anywhere in the documentation that stated the code was not thread-safe, although it's entirely possible I may have missed it.

Also worth noting that the code in which we found this issue isn't calculating the CRC of the same file repeatedly, instead working on different files, but the code listed above is a quick demonstration of the issue.

> FileUtils.checksumCRC32 and FileUtils.checksum are not thread safe
> ------------------------------------------------------------------
>
>                 Key: IO-718
>                 URL: https://issues.apache.org/jira/browse/IO-718
>             Project: Commons IO
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 2.8.0
>            Reporter: Robert Cooper
>            Priority: Major
>
> When calling {{FileUtils.checksumCRC32}} from multiple threads (in order to improve throughput when calculating CRC's for a large folder), the code is not thread-safe, resulting in incorrect CRC output.
> The following simple test demonstrates the issue:
> {code:java}
> @Test
> public void should() throws ExecutionException, InterruptedException {
>   File testFile = new File("C:\\Temp\\large-file.txt");
>   // ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
>   ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(5);
>   List<Future<Long>> futures = new ArrayList<>();
>   for (int i = 0; i < 20; i++) {
>     futures.add(scheduler.submit(() -> FileUtils.checksumCRC32(testFile)));
>   }
>   List<Long> crcs = new ArrayList<>();
>   for (Future<Long> future : futures) {
>     crcs.add(future.get());
>   }
>   Assertions.assertThat(crcs).allMatch(c -> crcs.get(0).equals(c));
> } {code}
> In the above code, with a thread-pool size of 1, all calculated CRC's for the file are the same.  With a thread-pool size of more, the CRC's differ.
> The issue appears to be related to the use of a common {{SKIP_BYTE_BUFFER}} in {{IOUtils.consume}}.  The multiple threads all read into the same buffer as the data is being "discarded".  However, {{FileUtils.checksum}} uses a {{CheckedInputStream}} to calculate the CRC, which uses the value read into the shared buffer.  With multiple threads writing to that buffer the CRC mechanism breaks down.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)