You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Dominique De Munck (JIRA)" <ji...@apache.org> on 2017/02/03 14:45:52 UTC

[jira] [Commented] (COMPRESS-381) performance issue when using default Wiki/docs bzip2 compression Factory methods

    [ https://issues.apache.org/jira/browse/COMPRESS-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851574#comment-15851574 ] 

Dominique De Munck commented on COMPRESS-381:
---------------------------------------------

I got some time to figure this one out.
Following code works for 4GB+ files and any Java version. Using the BufferedOutputStream makes a huge difference.
It turns a 1.4GB dbf file into a 63MB bzip2 file in 1min25sec on my portable, whereas the default tutorial code needs about 4min41sec.
7zip needs about 3min for 80MB, Rar 40 seconds also for 80MB.

So my suggestion to add to the examples is the following code (or variant), this can make a huge difference for users!

>>>
int COMPRESSION_LEVEL = 2;
int buffersize = 4000;

		FileInputStream fin = new FileInputStream(infile);
		FileOutputStream fos = new FileOutputStream(outfile);
		BufferedOutputStream bufferout = new BufferedOutputStream(fos, buffersize);
		BZip2CompressorOutputStream bzOut = new BZip2CompressorOutputStream(bufferout, COMPRESSION_LEVEL);
		try {
			final byte[] buffer = new byte[buffersize];
			int n = 0;
			while (-1 != (n = fin.read(buffer))) {
				bzOut.write(buffer, 0, n);
			}
		}
		finally {
			bzOut.close();
			fin.close();	
		}
>>>>

> performance issue when using default Wiki/docs bzip2 compression Factory methods
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-381
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-381
>             Project: Commons Compress
>          Issue Type: Improvement
>          Components: Documentation
>    Affects Versions: 1.13
>         Environment: Windows/All
>            Reporter: Dominique De Munck
>            Priority: Minor
>              Labels: documentation, easyfix, performance
>
> Hello
> We are going to use this project's bzip2 implementation as it performed best for our use case (tested using https://github.com/ning/jvm-compressor-benchmark).
> However, when following the default examples using the wiki/example/javadoc pages (*), we were hitting a serious performance bottleneck.
> The reason: the default "compress" operation on a file which is suggested, is very slow, maybe because of disk I/O and lack of caching.
> For a 2 MB tiff file, bzip2 compression takes about 3 seconds with code (A), whereas code (B) takes only about 0.5 seconds!
> So it would be good to adapt documentation or take a look at bottle neck.
> Kind regards
> Dominique
> >>>
> FileInputStream fin = new FileInputStream(infile);
> BufferedInputStream bufferin = new BufferedInputStream(fin);
> final FileOutputStream outStream = new FileOutputStream(outfile);
> CompressorOutputStream cos = new CompressorStreamFactory()		         .createCompressorOutputStream(CompressorStreamFactory.BZIP2, outStream);
> IOUtils.copy(fin, cos);
> cos.close();
> >>>
> B:
> <<<<<
> final byte[] uncompressed = Files.readAllBytes(infile.toPath());
> ByteArrayOutputStream rawOut = new ByteArrayOutputStream(uncompressed.length);
> 		
> BZip2CompressorOutputStream out = new BZip2CompressorOutputStream(rawOut, COMPRESSION_LEVEL);
> out.write(uncompressed);
> out.close();
> FileOutputStream fos = new FileOutputStream(outfile);
> rawOut.writeTo(fos);
> fos.close();
> >>>>
> (*)
> Pages with documentation:
> https://wiki.apache.org/commons/Compress
> https://commons.apache.org/proper/commons-compress/examples.html
> https://commons.apache.org/proper/commons-compress/javadocs/api-release/index.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)