You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Robert Cooper (Jira)" <ji...@apache.org> on 2021/02/03 12:44:00 UTC

[jira] [Updated] (IO-718) FileUtils.checksumCRC32 and FileUtils.checksum are not thread safe

     [ https://issues.apache.org/jira/browse/IO-718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Cooper updated IO-718:
-----------------------------
    Environment: 
Apache Commons Io 2.8.0.

JDK 1.8.0_181.

> FileUtils.checksumCRC32 and FileUtils.checksum are not thread safe
> ------------------------------------------------------------------
>
>                 Key: IO-718
>                 URL: https://issues.apache.org/jira/browse/IO-718
>             Project: Commons IO
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 2.8.0
>         Environment: Apache Commons Io 2.8.0.
> JDK 1.8.0_181.
>            Reporter: Robert Cooper
>            Priority: Major
>
> When calling {{FileUtils.checksumCRC32}} from multiple threads (in order to improve throughput when calculating CRC's for a large folder), the code is not thread-safe, resulting in incorrect CRC output.
> The following simple test demonstrates the issue:
> {code:java}
> @Test
> public void should() throws ExecutionException, InterruptedException {
>   File testFile = new File("C:\\Temp\\large-file.txt");
>   // ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
>   ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(5);
>   List<Future<Long>> futures = new ArrayList<>();
>   for (int i = 0; i < 20; i++) {
>     futures.add(scheduler.submit(() -> FileUtils.checksumCRC32(testFile)));
>   }
>   List<Long> crcs = new ArrayList<>();
>   for (Future<Long> future : futures) {
>     crcs.add(future.get());
>   }
>   Assertions.assertThat(crcs).allMatch(c -> crcs.get(0).equals(c));
> } {code}
> In the above code, with a thread-pool size of 1, all calculated CRC's for the file are the same.  With a thread-pool size of more, the CRC's differ.
> The issue appears to be related to the use of a common {{SKIP_BYTE_BUFFER}} in {{IOUtils.consume}}.  The multiple threads all read into the same buffer as the data is being "discarded".  However, {{FileUtils.checksum}} uses a {{CheckedInputStream}} to calculate the CRC, which uses the value read into the shared buffer.  With multiple threads writing to that buffer the CRC mechanism breaks down.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)