You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Royan <ro...@mail.ru> on 2010/01/20 17:20:45 UTC

HttpClient does not seem to correctly handle chunked response

We have an XML API service which splits reply XML data into chunks if it is
larger then certain amount of bytes. Here is the sample piece of the reply:

HTTP/1.1 200 OK
Server: nginx/0.6.35
Date: Wed, 20 Jan 2010 14:53:27 GMT
Content-Type: text/xml;charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
X-Powered-By: Servlet 2.4; JBoss-4.2.2.GA (build: SVNTag=JBoss_4_2_2_GA
date=200710221139)/Tomcat-5.5
Connection: close

1f0d
<?xml version='1.0' encoding='UTF-8'?>
<root>
[...]
<Label><![CDATA[Some character data br
2000
oken in the middle of the string]]></Label>
[...]
<root>

0


The problem is when this XML is retrieved via
httpResponse.getEntity().getContent() I expect all chunks to be transformed
into single XML with no service information (I'm talking about some strange
2000 number appearing in the middle of the string)

In fact returned content is not always correctly parsed and contains such
service information, which in turn makes my XML parser throw an exception

httpResponse.getEntity().isChunked() always retrieves "true"

Can anyone advice on what am I doing wrong or otherwise provide information
how to workaround such issue?

Thanks,
Roman 
-- 
View this message in context: http://old.nabble.com/HttpClient-does-not-seem-to-correctly-handle-chunked-response-tp27244426p27244426.html
Sent from the HttpClient-User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: HttpClient does not seem to correctly handle chunked response

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Wed, 2010-01-20 at 15:38 -0800, Royan wrote:
> Hello Oleg, thank you for your quick reply!
> 
> 
> First i'll ask a philosophic question. Is there any chance that such bug may
> occur due to the fact the server which sends the XML reply uses HTTPClient
> v.3 and my client which receives that reply is of version 4?  
> 

I seriously doubt that.

> 
> Can you please go in a bit more detail on what do you mean by "context log
> of the session"?
> 

http://hc.apache.org/httpcomponents-client/logging.html

Oleg


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: HttpClient does not seem to correctly handle chunked response

Posted by Royan <ro...@mail.ru>.
Hello Oleg, thank you for your quick reply!


First i'll ask a philosophic question. Is there any chance that such bug may
occur due to the fact the server which sends the XML reply uses HTTPClient
v.3 and my client which receives that reply is of version 4?  


Can you please go in a bit more detail on what do you mean by "context log
of the session"?

What actually happens -- I send raw xml in the body of HTTP Post request,
when the reply is received I eventually read it with XmlStreamReader (I
assume I do it from ChunkedInputStream) which fails with following
exception:

Caused by: com.sun.xml.stream.XMLStreamException2: ParseError at
[row,col]:[659,37]
Message: The character sequence "]]>" must not appear in content unless used
to mark the end of a CDATA section.
[StackTrace goes on]

If I simply output the contents of the received XML I can see those numbers
I mentioned earlier, which I assume are some chunk markers


I'll do my best to reproduce the problem and localize it in a standalone
test, but unfortunately I can not give you some public URL to the server
which generates XML as it is an internal server. 


Thanks,
Roman



olegk wrote:
> 
> On Wed, 2010-01-20 at 08:20 -0800, Royan wrote:
>> We have an XML API service which splits reply XML data into chunks if it
>> is
>> larger then certain amount of bytes. Here is the sample piece of the
>> reply:
>> 
>> HTTP/1.1 200 OK
>> Server: nginx/0.6.35
>> Date: Wed, 20 Jan 2010 14:53:27 GMT
>> Content-Type: text/xml;charset=UTF-8
>> Transfer-Encoding: chunked
>> Connection: keep-alive
>> X-Powered-By: Servlet 2.4; JBoss-4.2.2.GA (build: SVNTag=JBoss_4_2_2_GA
>> date=200710221139)/Tomcat-5.5
>> Connection: close
>> 
>> 1f0d
>> <?xml version='1.0' encoding='UTF-8'?>
>> <root>
>> [...]
>> <Label><![CDATA[Some character data br
>> 2000
>> oken in the middle of the string]]></Label>
>> [...]
>> <root>
>> 
>> 0
>> 
>> 
>> The problem is when this XML is retrieved via
>> httpResponse.getEntity().getContent() I expect all chunks to be
>> transformed
>> into single XML with no service information (I'm talking about some
>> strange
>> 2000 number appearing in the middle of the string)
>> 
>> In fact returned content is not always correctly parsed and contains such
>> service information, which in turn makes my XML parser throw an exception
>> 
>> httpResponse.getEntity().isChunked() always retrieves "true"
>> 
>> Can anyone advice on what am I doing wrong or otherwise provide
>> information
>> how to workaround such issue?
>> 
>> Thanks,
>> Roman 
> 
> I cannon recall a single confirmed problem with the correctness of the
> chunk coding code in HttpClient in the past 7 (seven) years I am a
> committer on the project. 
> 
> Double-check your code. If you are reasonably sure this is not an issue
> with your code, post a _COMPLETE_ wire / context log of the session and
> a test case reproducing the problem (preferably self-contained).
> 
> Oleg
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/HttpClient-does-not-seem-to-correctly-handle-chunked-response-tp27244426p27250544.html
Sent from the HttpClient-User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: HttpClient does not seem to correctly handle chunked response

Posted by Khosro Asgharifard <kh...@yahoo.com>.
Hello,
I have the same problem :
 Error fetching news HTML for: Event[entryData=EntryData(title=Yemeni al-Qaeda branch a magnet for jihadists, url=http://feeds.washingtonpost.com/click.phdo?i=2657414efdbc5d7710278204dac31246)].
java.io.IOException: CRLF expected at end of chunk: 49/54
    at org.apache.commons.httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java:207)
    at org.apache.commons.httpclient.ChunkedInputStream.nextChunk(ChunkedInputStream.java:219)
    at org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176)
    at org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196)
    at org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:369)
    at org.apache.commons.httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346)
    at java.io.FilterInputStream.close(FilterInputStream.java:155)
    at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:194)
    at org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)
    at java.io.BufferedInputStream.close(BufferedInputStream.java:451)
    at sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:358)
    at sun.nio.cs.StreamDecoder.close(StreamDecoder.java:173)
    at java.io.InputStreamReader.close(InputStreamReader.java:182)
    at net.htmlparser.jericho.Util.getString(Unknown Source)
    at net.htmlparser.jericho.Source.getString(Unknown Source)
    at net.htmlparser.jericho.Source.<init>(Unknown Source)
    at net.htmlparser.jericho.Source.<init>(Unknown Source)
    at ir.ideacenter.biz.service.NewsService.addNewNews(NewsService.java:116)
    at ir.ideacenter.biz.service.NewsService$$FastClassByCGLIB$$fa1e5631.invoke(<generated>)
    at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:149)
    at org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:692)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
    at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
    at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:625)
    at ir.ideacenter.biz.service.NewsService$$EnhancerByCGLIB$$a9589857.addNewNews(<generated>)
    at ir.ideacenter.biz.service.fetcher.NewsFetchListener.contentLoaded(NewsFetchListener.java:65)
    at ir.ideacenter.biz.crawler.AbstractCrawler.fireEvents(AbstractCrawler.java:56)
    at ir.ideacenter.biz.crawler.MultiThreadedNewsCrawler$L2Task.run(MultiThreadedNewsCrawler.java:199)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)

Khosro.




________________________________
From: Oleg Kalnichevski <ol...@apache.org>
To: HttpClient User Discussion <ht...@hc.apache.org>
Sent: Wed, January 20, 2010 8:46:49 AM
Subject: Re: HttpClient does not seem to correctly handle chunked response

On Wed, 2010-01-20 at 08:20 -0800, Royan wrote:
> We have an XML API service which splits reply XML data into chunks if it is
> larger then certain amount of bytes. Here is the sample piece of the reply:
> 
> HTTP/1.1 200 OK
> Server: nginx/0.6.35
> Date: Wed, 20 Jan 2010 14:53:27 GMT
> Content-Type: text/xml;charset=UTF-8
> Transfer-Encoding: chunked
> Connection: keep-alive
> X-Powered-By: Servlet 2.4; JBoss-4.2.2.GA (build: SVNTag=JBoss_4_2_2_GA
> date=200710221139)/Tomcat-5.5
> Connection: close
> 
> 1f0d
> <?xml version='1.0' encoding='UTF-8'?>
> <root>
> [...]
> <Label><![CDATA[Some character data br
> 2000
> oken in the middle of the string]]></Label>
> [...]
> <root>
> 
> 0
> 
> 
> The problem is when this XML is retrieved via
> httpResponse.getEntity().getContent() I expect all chunks to be transformed
> into single XML with no service information (I'm talking about some strange
> 2000 number appearing in the middle of the string)
> 
> In fact returned content is not always correctly parsed and contains such
> service information, which in turn makes my XML parser throw an exception
> 
> httpResponse.getEntity().isChunked() always retrieves "true"
> 
> Can anyone advice on what am I doing wrong or otherwise provide information
> how to workaround such issue?
> 
> Thanks,
> Roman 

I cannon recall a single confirmed problem with the correctness of the
chunk coding code in HttpClient in the past 7 (seven) years I am a
committer on the project. 

Double-check your code. If you are reasonably sure this is not an issue
with your code, post a _COMPLETE_ wire / context log of the session and
a test case reproducing the problem (preferably self-contained).

Oleg




---------------------------------------------------------------------
To unsubscribe, e-mail: mailto:httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: mailto:httpclient-users-help@hc.apache.org


      

Re: HttpClient does not seem to correctly handle chunked response

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Wed, 2010-01-20 at 08:20 -0800, Royan wrote:
> We have an XML API service which splits reply XML data into chunks if it is
> larger then certain amount of bytes. Here is the sample piece of the reply:
> 
> HTTP/1.1 200 OK
> Server: nginx/0.6.35
> Date: Wed, 20 Jan 2010 14:53:27 GMT
> Content-Type: text/xml;charset=UTF-8
> Transfer-Encoding: chunked
> Connection: keep-alive
> X-Powered-By: Servlet 2.4; JBoss-4.2.2.GA (build: SVNTag=JBoss_4_2_2_GA
> date=200710221139)/Tomcat-5.5
> Connection: close
> 
> 1f0d
> <?xml version='1.0' encoding='UTF-8'?>
> <root>
> [...]
> <Label><![CDATA[Some character data br
> 2000
> oken in the middle of the string]]></Label>
> [...]
> <root>
> 
> 0
> 
> 
> The problem is when this XML is retrieved via
> httpResponse.getEntity().getContent() I expect all chunks to be transformed
> into single XML with no service information (I'm talking about some strange
> 2000 number appearing in the middle of the string)
> 
> In fact returned content is not always correctly parsed and contains such
> service information, which in turn makes my XML parser throw an exception
> 
> httpResponse.getEntity().isChunked() always retrieves "true"
> 
> Can anyone advice on what am I doing wrong or otherwise provide information
> how to workaround such issue?
> 
> Thanks,
> Roman 

I cannon recall a single confirmed problem with the correctness of the
chunk coding code in HttpClient in the past 7 (seven) years I am a
committer on the project. 

Double-check your code. If you are reasonably sure this is not an issue
with your code, post a _COMPLETE_ wire / context log of the session and
a test case reproducing the problem (preferably self-contained).

Oleg




---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: HttpClient does not seem to correctly handle chunked response

Posted by dpravn <dp...@gmail.com>.
Hi Roman,

I am facing the same situation with http client. For some reason JBoss 4.2.2
with Tomcat 5.5 doesn't send proper chunked response.

Can you please let me know whether your issue has been resolved? Is it a bug
in Tomcat server or httpclient?

Any help would be quite useful.

Thanks
Praveen



RomanY wrote:
> 
> We have an XML API service which splits reply XML data into chunks if it
> is larger then certain amount of bytes. Here is the sample piece of the
> reply:
> 
> HTTP/1.1 200 OK
> Server: nginx/0.6.35
> Date: Wed, 20 Jan 2010 14:53:27 GMT
> Content-Type: text/xml;charset=UTF-8
> Transfer-Encoding: chunked
> Connection: keep-alive
> X-Powered-By: Servlet 2.4; JBoss-4.2.2.GA (build: SVNTag=JBoss_4_2_2_GA
> date=200710221139)/Tomcat-5.5
> Connection: close
> 
> 1f0d
> <?xml version='1.0' encoding='UTF-8'?>
> <root>
> [...]
> <Label><![CDATA[Some character data br
> 2000
> oken in the middle of the string]]></Label>
> [...]
> <root>
> 
> 0
> 
> 
> The problem is when this XML is retrieved via
> httpResponse.getEntity().getContent() I expect all chunks to be
> transformed into single XML with no service information (I'm talking about
> some strange 2000 number appearing in the middle of the string)
> 
> In fact returned content is not always correctly parsed and contains such
> service information, which in turn makes my XML parser throw an exception
> 
> httpResponse.getEntity().isChunked() always retrieves "true"
> 
> Can anyone advice on what am I doing wrong or otherwise provide
> information how to workaround such issue?
> 
> Thanks,
> Roman 
> 

-- 
View this message in context: http://old.nabble.com/HttpClient-does-not-seem-to-correctly-handle-chunked-response-tp27244426p29172510.html
Sent from the HttpClient-User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org