You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Cosmin Carabet (Jira)" <ji...@apache.org> on 2024/03/01 16:03:00 UTC

[jira] [Commented] (COMPRESS-666) Multithreaded access to Tar archive throws java.util.zip.ZipException: Corrupt GZIP trailer

    [ https://issues.apache.org/jira/browse/COMPRESS-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822626#comment-17822626 ] 

Cosmin Carabet commented on COMPRESS-666:
-----------------------------------------

That's an interesting theory. So with the above, the block size becomes 512, which divides the default buffer size for BufferedInputStream (8192). Unfortunately, I've just tried it out and I'm hitting the same issue. It's a tgz around 10MB in size.

Given that I'm creating new TarArchiveInputStream objects every time, I wouldn't expect any contention on those buffers actually. Is there some sort of shared state between objects by default ? Similarly, with the larger tar, things work correctly on 1.25.0

> Multithreaded access to Tar archive throws java.util.zip.ZipException: Corrupt GZIP trailer
> -------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-666
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-666
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.26.0
>         Environment: Commons compress 1.26.0 to get a failure. Any tar tgz.
>            Reporter: Cosmin Carabet
>            Priority: Major
>
> Something in [https://github.com/apache/commons-compress/compare/rel/commons-compress-1.25.0...master] seems to make iterating through the tar entries of multiple 
> TarArchiveInputStreams throw Corrupted TAR archive:
>  
> {code:java}
> @Test
> void bla() {
>     ExecutorService executorService = Executors.newFixedThreadPool(10);
>     List<CompletableFuture<Void>> tasks = IntStream.range(0, 200)
>             .mapToObj(_idx -> CompletableFuture.runAsync(
>                     () -> {
>                         try (InputStream inputStream = this.getClass()
>                                         .getResourceAsStream(
>                                                 "/<your favourite tar tgz>");
>                                 TarArchiveInputStream tarInputStream =
>                                         new TarArchiveInputStream(new GZIPInputStream(inputStream))) {
>                             TarArchiveEntry tarEntry;
>                             while ((tarEntry = tarInputStream.getNextTarEntry()) != null) {
>                                 System.out.println("Reading entry %s with size %d"
>                                         .formatted(tarEntry.getName(), tarEntry.getSize()));
>                             }
>                         } catch (Exception ex) {
>                             throw new RuntimeException(ex);
>                         }
>                     },
>                     executorService))
>             .toList();
>     Futures.getUnchecked(CompletableFuture.allOf(tasks.toArray(new CompletableFuture<?>[0])));
> } {code}
> Although TarArchiveInputStream is marked as not thread safe, I am not reusing objects here. Those are in fact separate objects, presumably all with their own position tracking info.
>  
> The stacktrace here looks like:
> {code:java}
> Caused by: java.io.IOException: Corrupted TAR archive.
>     at org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1480)
>     at org.apache.commons.compress.archivers.tar.TarArchiveEntry.<init>(TarArchiveEntry.java:534)
>     at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:431)
>     at
> Caused by: java.lang.IllegalArgumentException: Invalid byte 100 at offset 0 in 'dddddddddddd' len=12
>     at org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:516)
>     at org.apache.commons.compress.archivers.tar.TarUtils.parseOctalOrBinary(TarUtils.java:540)
>     at org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeaderUnwrapped(TarArchiveEntry.java:1496)
>     at org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1478)
>     ... 7 more
>  {code}
> That code shows that occasionally the header is wrong (the tar entry name contains gibberish bits) which makes me think that `getNextTarEntry()` can be faulty.
>  
> Running that code with commons compress 1.25.0 works as expected. So it's probably something added since November. Note that this is something related to parallelism - using an executor service with a single thread doesn't suffer from the same error. The tgz to decompress doesn't really matter - you can use a manually created one worth a few KBs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)