You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Sandeep Khadkekar (JIRA)" <ji...@apache.org> on 2015/07/25 01:11:05 UTC

[jira] [Commented] (COMPRESS-185) BZip2CompressorInputStream truncates files compressed with pbzip2

    [ https://issues.apache.org/jira/browse/COMPRESS-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641215#comment-14641215 ] 

Sandeep Khadkekar commented on COMPRESS-185:
--------------------------------------------

I still don't see this working. Am I missing anything here?

We were earlier using bzip2 and switched to use pbzip2 and saw massive performance improvement. But our client who is using apache commons 1.9 to uncompress it is complaining that they are getting exception while uncompressing. Sample uncompression on client side:

import org.apache.commons.compress.archivers.ArchiveException;
import org.apache.commons.compress.archivers.ArchiveStreamFactory;
import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveInputStream;
import org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream;
import org.apache.commons.io.IOUtils;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.util.LinkedList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class UncompressTest {

    public static void main(String[] args) throws ArchiveException, IOException {
        String regex = ".*\\/application$";
        List<File> decompFiles = new LinkedList<File>();

        TarArchiveInputStream inputStream = (TarArchiveInputStream) new ArchiveStreamFactory().createArchiveInputStream("tar", new BZip2CompressorInputStream(new FileInputStream("/Users/skhadkekar/archive/myarchive.tbz")));

        try {
            TarArchiveEntry entry = null;
            Pattern r = Pattern.compile(regex);
            int index = 0;
            while ((entry = (TarArchiveEntry) inputStream.getNextEntry()) != null) {
                final File outputFile = new File("/Users/skhadkekar/archive/uncompress/", entry.getName());
                if (entry.isDirectory()) {
                    System.out.println("The direcotry is: " + entry.getName());
                    if (!outputFile.exists()) {
                        if (!outputFile.mkdirs()) {
                            throw new IOException("Couldn't create directory for " + outputFile.getAbsolutePath());
                        }
                    }
                } else {
                    Matcher m = r.matcher(entry.getName());
                    System.out.println("The file name in the directory is:" + entry.getName());
                    if (m.find()) {
                        ++index;
                        OutputStream outputFileStream = new FileOutputStream(outputFile);
                        IOUtils.copy(inputStream, outputFileStream);
                        outputFileStream.close();
                        decompFiles.add(outputFile);
                        if (index >= 1) {
                            break;
                        }
                    }
                }
            }
        } catch(Exception eAny) {
            eAny.printStackTrace();
        } finally {
            inputStream.close();
        }
    }
}

> BZip2CompressorInputStream truncates files compressed with pbzip2
> -----------------------------------------------------------------
>
>                 Key: COMPRESS-185
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-185
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>    Affects Versions: 1.3
>            Reporter: Karsten Loesing
>             Fix For: 1.4
>
>
> I'm using BZip2CompressorInputStream in Compress 1.3 to decompress a file that was created with pbzip2 1.1.6 (http://compression.ca/pbzip2/).  The stream ends early after 900000 bytes, truncating the rest of the pbzip2-compressed file.  Decompressing the file with bunzip2 or compressing the original file with bzip2 both fix the issue.  I think both pbzip2 and Compress are to blame here: pbzip2 apparently does something non-standard when compressing files, and Compress should handle the non-standard format rather than pretending to be done decompressing.  Another option is that I'm doing something wrong; in that case please let me know! :)
> Here's how the problem can be reproduced:
>  1. Generate a file that's 900000+ bytes large: dd if=/dev/zero of=1mbfile count=1 bs=1M
>  2. Compress with pbzip2: pbzip2 1mbfile
>  3. Decompress with Bunzip2 class below
>  4. Notice how the resulting 1mbfile is 900000 bytes large, not 1M.
> Now compare to using bunzip2/bzip2:
>  - Do the steps above, but instead of 2, compress with bzip2: bzip2 1mbfile
>  - Do the steps above, but instead of 3, decompress with bunzip2: bunzip2 1mbfile.bz2
> import java.io.*;
> import org.apache.commons.compress.compressors.bzip2.*;
> public class Bunzip2 {
>   public static void main(String[] args) throws Exception {
>     File inFile = new File(args[0]);
>     File outFile = new File(args[0].substring(0, args[0].length() - 4));
>     FileInputStream fis = new FileInputStream(inFile);
>     BZip2CompressorInputStream bz2cis =
>         new BZip2CompressorInputStream(fis);
>     BufferedInputStream bis = new BufferedInputStream(bz2cis);
>     BufferedOutputStream bos = new BufferedOutputStream(
>         new FileOutputStream(outFile));
>     int len;
>     byte[] data = new byte[1024];
>     while ((len = bis.read(data, 0, 1024)) >= 0) {
>       bos.write(data, 0, len);
>     }   
>     bos.close();
>     bis.close();
>   }
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)