You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Peter Lee (Jira)" <ji...@apache.org> on 2020/04/22 08:58:00 UTC

[jira] [Issue Comment Deleted] (COMPRESS-510) Multiple retrievals of InputStream for same SevenZFile entry fails

     [ https://issues.apache.org/jira/browse/COMPRESS-510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Lee updated COMPRESS-510:
-------------------------------
    Comment: was deleted

(was: > Commons-compress does not get a CRC in the SevenZFile#readFilesInfo method. But the 7z GUI shows me a CRC (784DD132 for test.txt).

I have figured out why it's like this. But it's a little complicated - especially for the ones that are not familiar with 7z format specification(and the documentation of 7z is terrible).

There're something called 'Folder' in 7z format - it's nothing special but a bunch of files. All the files in the same 'Folder' will be regarded as a single file and be compressed together. This is different from Zip and that's why it's difficult to have random access in 7z. And the compressed 'Folder' has a CRC checksum.

In your test case, the 'Folder' contains only 1 file, which is test.txt. The CRC checksum of the 'Folder' is 784DD132.

You can generate another 7z like this:
{code:java}
@Test
public void retrieveInputStreamForAllEntriesMultipleTimes() throws IOException {
    try (final SevenZOutputFile out = new SevenZOutputFile(new File(dir, "test.7z"))) {
        final Path inputFile = Files.createTempFile("SevenZTestTemp", "");

        SevenZArchiveEntry entry = out.createArchiveEntry(inputFile.toFile(), "test.txt");
        out.putArchiveEntry(entry);
        out.write("Test".getBytes(StandardCharsets.UTF_8));

        SevenZArchiveEntry entry1 = out.createArchiveEntry(inputFile.toFile(), "test1.txt");
        out.putArchiveEntry(entry1);
        out.write("Test1".getBytes(StandardCharsets.UTF_8));
        out.closeArchiveEntry();

        Files.deleteIfExists(inputFile);
    }

    try (SevenZFile sevenZFile = new SevenZFile(new File(dir, "test.7z"))) {
        for (SevenZArchiveEntry entry : sevenZFile.getEntries()) {
            byte[] firstRead = IOUtils.toByteArray(sevenZFile.getInputStream(entry));
            byte[] secondRead = IOUtils.toByteArray(sevenZFile.getInputStream(entry));
            assertArrayEquals(firstRead, secondRead);
        }
    }
}
{code}
This will generate a 7z with a folder having 2 files : test.txt and test1.txt. Then you can checkout the CRC in 7z GUI. I have checked it with 7z GUI 19.00. The CRC32 of test1.txt turns out to be 91DE2B91 and the CRC32 of test.txt is empty.

!image-2020-04-22-16-55-08-369.png!

Actually 91DE2B91 is the CRC checksum of the folder, not the test1.txt or test.txt. I have double check the code that Commons Compress will not add CRC for each entry, we will only provide a CRC for folder.

I think the 7z GUI is a bit confusing - it's showing the CRC of folder in the line of some file in folder.)

> Multiple retrievals of InputStream for same SevenZFile entry fails
> ------------------------------------------------------------------
>
>                 Key: COMPRESS-510
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-510
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.20
>            Reporter: Robin Schimpf
>            Assignee: Peter Lee
>            Priority: Major
>         Attachments: image-2020-04-22-16-55-08-369.png
>
>
> I was trying out the new random access for the 7z files and have one of our tests failing where we are trying to read the same entry multiple times without closing the archive.
> Reproducing test case (I added this locally to the SevenZFileTest class)
> {code:java}
> @Test
> public void retrieveInputStreamForEntryMultipleTimes() throws IOException {
>     try (SevenZFile sevenZFile = new SevenZFile(getFile("bla.7z"))) {
>         for (SevenZArchiveEntry entry : sevenZFile.getEntries()) {
>             byte[] firstRead = IOUtils.toByteArray(sevenZFile.getInputStream(entry));
>             byte[] secondRead = IOUtils.toByteArray(sevenZFile.getInputStream(entry));
>             assertArrayEquals(firstRead, secondRead);
>         }
>     }
> }
> {code}
> The Exception thrown is
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 2	at org.apache.commons.compress.archivers.sevenz.SevenZFile.buildDecodingStream(SevenZFile.java:1183)
> 	at org.apache.commons.compress.archivers.sevenz.SevenZFile.getInputStream(SevenZFile.java:1436)
> 	at org.apache.commons.compress.archivers.sevenz.SevenZFileTest.retrieveInputStreamForEntryMultipleTimes(SevenZFileTest.java:688)
> 	...
> {code}
> A similar test case for e.g. zip works fine
> {code:java}
> @Test
> public void retrieveInputStreamForEntryMultipleTimes() throws IOException {
>     try (ZipFile zipFile = new ZipFile(getFile("bla.zip"))) {
>         Enumeration<ZipArchiveEntry> entry = zipFile.getEntries();
>         while (entry.hasMoreElements()) {
>             ZipArchiveEntry e = entry.nextElement();
>             byte[] firstRead = IOUtils.toByteArray(zipFile.getInputStream(e));
>             byte[] secondRead = IOUtils.toByteArray(zipFile.getInputStream(e));
>             assertArrayEquals(firstRead, secondRead);
>         }
>     }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)