You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2021/09/09 20:40:00 UTC

[jira] [Commented] (HTTPCLIENT-2176) Premature end of Content-Length delimited message body but works with wget

    [ https://issues.apache.org/jira/browse/HTTPCLIENT-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17412827#comment-17412827 ] 

Tim Allison commented on HTTPCLIENT-2176:
-----------------------------------------

stdout from wget:

{noformat}
wget https://direitosculturais.com.br/pdf.php?id=151
--2021-09-09 16:32:47--  https://direitosculturais.com.br/pdf.php?id=151
Resolving direitosculturais.com.br (direitosculturais.com.br)... 191.6.210.158
Connecting to direitosculturais.com.br (direitosculturais.com.br)|191.6.210.158|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 216481 (211K) [application/save]
Saving to: ‘pdf.php?id=151’

pdf.php?id=151                100%[=================================================>] 211.41K   228KB/s    in 0.9s    

2021-09-09 16:32:49 (228 KB/s) - ‘pdf.php?id=151’ saved [216481/216481]
{noformat}

> Premature end of Content-Length delimited message body but works with wget
> --------------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-2176
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2176
>             Project: HttpComponents HttpClient
>          Issue Type: Task
>         Environment: httpclient: 4.5.13
> httpcore: 4.4.14
> java 11 (archaic): openjdk version "11.0.4" 2019-07-16
>            Reporter: Tim Allison
>            Priority: Major
>
> I'm doing a recrawl of truncated files from CommonCrawl in support of work on Apache Tika, and I've found a few files where I'm able to download the files successfully with wget but with httpclient, I'm getting:
> {noformat}
> org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 216,481; received: 203,820)
> 	at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178)
> 	at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:198)
> 	at org.apache.http.impl.io.ContentLengthInputStream.close(ContentLengthInputStream.java:101)
> 	at org.apache.http.impl.execchain.ResponseEntityProxy.streamClosed(ResponseEntityProxy.java:142)
> 	at org.apache.http.conn.EofSensorInputStream.checkClose(EofSensorInputStream.java:228)
> 	at org.apache.http.conn.EofSensorInputStream.close(EofSensorInputStream.java:172)
> 	at java.base/java.util.zip.InflaterInputStream.close(InflaterInputStream.java:232)
> 	at java.base/java.util.zip.GZIPInputStream.close(GZIPInputStream.java:137)
> 	at org.apache.http.client.entity.LazyDecompressingInputStream.close(LazyDecompressingInputStream.java:94)
> 	at FetcherTest.testBasic(FetcherTest.java:40)
> 	
> {noformat}
> The triggering file is: https://direitosculturais.com.br/pdf.php?id=151
> Example all defaults:
> {noformat}
>         String url = "https://direitosculturais.com.br/pdf.php?id=151";
>         HttpClient client = HttpClientBuilder.create().build();
>         HttpGet get = new HttpGet(url);
>         HttpResponse r = client.execute(get);
>         Path output = Paths.get("/data/tmp.pdf");
>         try (InputStream is = r.getEntity().getContent()) {
>             Files.copy(is, output, StandardCopyOption.REPLACE_EXISTING);
>         }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org