You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by "Oleg Kalnichevski (JIRA)" <ji...@apache.org> on 2014/02/18 15:10:20 UTC

[jira] [Resolved] (HTTPCLIENT-1461) GZIP decoding is very slow

     [ https://issues.apache.org/jira/browse/HTTPCLIENT-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Oleg Kalnichevski resolved HTTPCLIENT-1461.
-------------------------------------------

       Resolution: Fixed
    Fix Version/s: 4.4 Alpha1
                   4.3.3

Fixed in SVN trunk and 4.3.x. Please review / re-test with the latest SVN snapshot.

Oleg

> GZIP decoding is very slow
> --------------------------
>
>                 Key: HTTPCLIENT-1461
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1461
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 4.3.2
>            Reporter: Sebastiano Vigna
>            Priority: Critical
>              Labels: regression
>             Fix For: 4.3.3, 4.4 Alpha1
>
>
> In 4.3.1, LazyDecompressingInputStream was introduced. However, LazyDecompressingInputStream subclasses InputStream without overriding the multi-byte read() method, and the inherited method does a byte-by-byte read. 
> This is trace showing what happens:
>        java.util.zip.Inflater.inflateBytes(Inflater.java:Unknown line)
>        java.util.zip.Inflater.inflate(Inflater.java:259)
>        java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
>        java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116)
>        java.util.zip.InflaterInputStream.read(InflaterInputStream.java:122)
>        org.apache.http.client.entity.LazyDecompressingInputStream.read(LazyDecompressingInputStream.java:56)
>        java.io.InputStream.read(InputStream.java:179)
>        it.unimi.di.law.warc.util.InspectableCachedHttpEntity.copyContent(InspectableCachedHttpEntity.java:67)
> copyContent() would love to read(byte[],int,int) in a buffer, but since LazyDecompressingInputStream doesn't override it it invokes instead the read-byte-by-byte inherited method in InputStream, which in turn now calls for each byte the one-byte read() method from LazyDecompressingInputStream, which invokes the one-byte read method from InflaterInputStream, which does a multi-byte, length-one read from GZIPInputStream, which unleashes a similar call on InflaterInputStream, which unfortunately makes a similar read using the native inflateBytes() method.
> Thus, for each byte there is a native-method call. The result is a 10-50x increase in CPU usage, which turns into a 10x-50x decrease in speed if, as in our case, you have 7000 threads downloading in parallel.
> Overriding read(byte[],int,int) in LazyDecompressingInputStream will solve the problem:
>     @Override
>     public int read(byte[] b, int off, int len) throws IOException {
>         initWrapper();
>         return wrapperStream.read(b, off, len);
>     }



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org