You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Robert Cooper (Jira)" <ji...@apache.org> on 2021/02/03 12:34:00 UTC

[jira] [Created] (IO-718) FileUtils.checksumCRC32 and FileUtils.checksum are not thread safe

Robert Cooper created IO-718:
--------------------------------

             Summary: FileUtils.checksumCRC32 and FileUtils.checksum are not thread safe
                 Key: IO-718
                 URL: https://issues.apache.org/jira/browse/IO-718
             Project: Commons IO
          Issue Type: Bug
          Components: Utilities
    Affects Versions: 2.8.0
            Reporter: Robert Cooper


When calling {{FileUtils.checksumCRC32}} from multiple threads (in order to improve throughput when calculating CRC's for a large folder), the code is not thread-safe, resulting in incorrect CRC output.

The following simple test demonstrates the issue:
{code:java}
@Test
public void should() throws ExecutionException, InterruptedException {
  File testFile = new File("C:\\Temp\\large-file.txt");
  // ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
  ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(5);
  List<Future<Long>> futures = new ArrayList<>();
  for (int i = 0; i < 20; i++) {
    futures.add(scheduler.submit(() -> FileUtils.checksumCRC32(testFile)));
  }
  List<Long> crcs = new ArrayList<>();
  for (Future<Long> future : futures) {
    crcs.add(future.get());
  }
  Assertions.assertThat(crcs).allMatch(c -> crcs.get(0).equals(c));
} {code}
In the above code, with a thread-pool size of 1, all calculated CRC's for the file are the same.  With a thread-pool size of more, the CRC's differ.

The issue appears to be related to the use of a common {{SKIP_BYTE_BUFFER}} in {{IOUtils.consume}}.  The multiple threads all read into the same buffer as the data is being "discarded".  However, {{FileUtils.checksum}} uses a {{CheckedInputStream}} to calculate the CRC, which uses the value read into the shared buffer.  With multiple threads writing to that buffer the CRC mechanism breaks down.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)