You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Sebastiano Vigna <vi...@di.unimi.it> on 2014/02/15 11:22:39 UTC

Incredibly slow gzip downloads

Something happened from HttpClient 4.3.1 to 4.3.2.

All of a sudden, we are seeing an immense amount of CPU spent into inflating compressed HTTP responses. We were experimenting >5000 pages/s two months ago, but less than 1200 pages/s now (same hardware, etc.), for no apparent reason. The CPU is 100% consumed by Inflater.

The absolutely weird thing is that in profiling we are spending 15% doing inflation and 1.5% doing deflation... and the same amount of data is going through the two phases. Downgrading to 4.3.1 immediately brought back the old performance, and we spend more time in deflation than in inflation (as it should be).

Maybe some change in the buffer size?

Ciao,

					seba


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Incredibly slow gzip downloads

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Sat, 2014-02-15 at 17:06 +0100, Sebastiano Vigna wrote:
> On 15 Feb 2014, at 12:47 PM, Oleg Kalnichevski <ol...@apache.org> wrote:
> 
> > The problem mostly likely has been introduced by HTTPCLIENT-1432 [1]. I
> > reviewed the patch once more and could not find anything obviously wrong
> > with it. Try reverting the changes introduced by HTTPCLIENT-1432 and see
> 
> Subclassing InputStream without overriding the multi-byte read() method is a recipe for disaster... the inherited method does a byte-by-byte read. You can see what's happening in this hprof trace:
> 
>         java.util.zip.Inflater.inflateBytes(Inflater.java:Unknown line)
>         java.util.zip.Inflater.inflate(Inflater.java:259)
>         java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
>         java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116)
>         java.util.zip.InflaterInputStream.read(InflaterInputStream.java:122)
>         org.apache.http.client.entity.LazyDecompressingInputStream.read(LazyDecompressingInputStream.java:56)
>         java.io.InputStream.read(InputStream.java:179)
>         it.unimi.di.law.warc.util.InspectableCachedHttpEntity.copyContent(InspectableCachedHttpEntity.java:67)
> 
> copyContent() would love to read(byte[],int,int) in a buffer, but since LazyDecompressingInputStream doesn't override it it invokes instead the read-byte-by-byte inherited method in InputStream, which in turn now calls for each byte the one-byte read() method from LazyDecompressingInputStream, which invokes the one-byte read method from InflaterInputStream, which does a multi-byte, length-one read from GZIPInputStream, which unleashes a similar call on InflaterInputStream, which unfortunately makes a similar read using the native inflateBytes() method. 
> 
> The result is a 10-50x decrease in speed, but I think that a trivial override of read(byte[],int,int) in LazyDecompressingInputStream will solve the problem.
> 

Sebastiano

Could you please raise a JIRA for this defect and tag it as regression?

Oleg



---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Incredibly slow gzip downloads

Posted by Sebastiano Vigna <vi...@di.unimi.it>.
On 15 Feb 2014, at 12:47 PM, Oleg Kalnichevski <ol...@apache.org> wrote:

> The problem mostly likely has been introduced by HTTPCLIENT-1432 [1]. I
> reviewed the patch once more and could not find anything obviously wrong
> with it. Try reverting the changes introduced by HTTPCLIENT-1432 and see

Subclassing InputStream without overriding the multi-byte read() method is a recipe for disaster... the inherited method does a byte-by-byte read. You can see what's happening in this hprof trace:

        java.util.zip.Inflater.inflateBytes(Inflater.java:Unknown line)
        java.util.zip.Inflater.inflate(Inflater.java:259)
        java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
        java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116)
        java.util.zip.InflaterInputStream.read(InflaterInputStream.java:122)
        org.apache.http.client.entity.LazyDecompressingInputStream.read(LazyDecompressingInputStream.java:56)
        java.io.InputStream.read(InputStream.java:179)
        it.unimi.di.law.warc.util.InspectableCachedHttpEntity.copyContent(InspectableCachedHttpEntity.java:67)

copyContent() would love to read(byte[],int,int) in a buffer, but since LazyDecompressingInputStream doesn't override it it invokes instead the read-byte-by-byte inherited method in InputStream, which in turn now calls for each byte the one-byte read() method from LazyDecompressingInputStream, which invokes the one-byte read method from InflaterInputStream, which does a multi-byte, length-one read from GZIPInputStream, which unleashes a similar call on InflaterInputStream, which unfortunately makes a similar read using the native inflateBytes() method. 

The result is a 10-50x decrease in speed, but I think that a trivial override of read(byte[],int,int) in LazyDecompressingInputStream will solve the problem.

Ciao,

					seba


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Incredibly slow gzip downloads

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Sat, 2014-02-15 at 11:22 +0100, Sebastiano Vigna wrote:
> Something happened from HttpClient 4.3.1 to 4.3.2.
> 
> All of a sudden, we are seeing an immense amount of CPU spent into inflating compressed HTTP responses. We were experimenting >5000 pages/s two months ago, but less than 1200 pages/s now (same hardware, etc.), for no apparent reason. The CPU is 100% consumed by Inflater.
> 
> The absolutely weird thing is that in profiling we are spending 15% doing inflation and 1.5% doing deflation... and the same amount of data is going through the two phases. Downgrading to 4.3.1 immediately brought back the old performance, and we spend more time in deflation than in inflation (as it should be).
> 
> Maybe some change in the buffer size?
> 
> Ciao,
> 

Here's the diff between 4.3.1 and 4.3.2

https://github.com/apache/httpclient/compare/4.3.1...4.3.2

The problem mostly likely has been introduced by HTTPCLIENT-1432 [1]. I
reviewed the patch once more and could not find anything obviously wrong
with it. Try reverting the changes introduced by HTTPCLIENT-1432 and see
if that resolves the problem.

Oleg

[1] https://issues.apache.org/jira/browse/HTTPCLIENT-1432




---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org