You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Gary D. Gregory (Jira)" <ji...@apache.org> on 2021/02/17 16:18:00 UTC
[jira] [Resolved] (IO-718) FileUtils.checksumCRC32 and
FileUtils.checksum are not thread safe
[ https://issues.apache.org/jira/browse/IO-718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gary D. Gregory resolved IO-718.
--------------------------------
Fix Version/s: 2.9.0
Resolution: Fixed
> FileUtils.checksumCRC32 and FileUtils.checksum are not thread safe
> ------------------------------------------------------------------
>
> Key: IO-718
> URL: https://issues.apache.org/jira/browse/IO-718
> Project: Commons IO
> Issue Type: Bug
> Components: Utilities
> Affects Versions: 2.8.0
> Environment: Apache Commons Io 2.8.0.
> JDK 1.8.0_181.
> Reporter: Robert Cooper
> Priority: Major
> Fix For: 2.9.0
>
>
> When calling {{FileUtils.checksumCRC32}} from multiple threads (in order to improve throughput when calculating CRC's for a large folder), the code is not thread-safe, resulting in incorrect CRC output.
> The following simple test demonstrates the issue:
> {code:java}
> @Test
> public void should() throws ExecutionException, InterruptedException {
> File testFile = new File("C:\\Temp\\large-file.txt");
> // ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
> ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(5);
> List<Future<Long>> futures = new ArrayList<>();
> for (int i = 0; i < 20; i++) {
> futures.add(scheduler.submit(() -> FileUtils.checksumCRC32(testFile)));
> }
> List<Long> crcs = new ArrayList<>();
> for (Future<Long> future : futures) {
> crcs.add(future.get());
> }
> Assertions.assertThat(crcs).allMatch(c -> crcs.get(0).equals(c));
> } {code}
> In the above code, with a thread-pool size of 1, all calculated CRC's for the file are the same. With a thread-pool size of more, the CRC's differ.
> The issue appears to be related to the use of a common {{SKIP_BYTE_BUFFER}} in {{IOUtils.consume}}. The multiple threads all read into the same buffer as the data is being "discarded". However, {{FileUtils.checksum}} uses a {{CheckedInputStream}} to calculate the CRC, which uses the value read into the shared buffer. With multiple threads writing to that buffer the CRC mechanism breaks down.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)