You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by James Howe <jw...@allencreek.com> on 2005/12/21 22:53:50 UTC

Problem retrieving XML from a URL (Newbie question)

I've just started using HTTPClient in my application.  I have some simple  
code which performs a GET request on a URL which returns XML.  The XML  
contains a numeric entity definition for the copyright symbol '&#169;',  
but when I read the response, it seems tha HTTPClient is converting it to  
&copy;.  Is there some way to configure HTTPClient to not do this?  I'm  
retrieving the XML content and then passing it on to another process which  
attempts to parse it and the parsing fails because the entity &copy; isn't  
defined.  I need to have any entities defined in a numeric form maintained  
in a numeric form.  My HTTPClient code looks something like this:


    HttpClient client = new HttpClient();
    GetMethod method = new GetMethod("some url which returns XML");
    method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, new  
DefaultHttpMethodRetryHandler(3, false));
    try {
       int statusCode = client.executeMethod(feed.hostConfiguration(),  
method);
       // ...
       result = method.getResponseBodyAsString();
    ...

If the returned XML contains &#169;, the content returned by  
getResponseBodyAsString contains &copy; instead.  I want the result to be  
the same content as what was sent by the server, not processed.

I'm using HTTPClient 3.0.

Thanks!

-- 
James Howe

Contact: http://public.xdi.org/=James.Howe

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: Problem retrieving XML from a URL (Newbie question)

Posted by James Howe <jw...@allencreek.com>.
Found my problem.  I hadn't realized that when I was inspecting the value  
in my Java debugger the xml had been put through a pretty-print method  
which was responsible for converting the numeric entity references into  
entity name references.

At least I now know where to remedy the problem.

Thanks for the help.


On Wed, 21 Dec 2005 19:33:24 -0500, Wade Chandler  
<hw...@yahoo.com> wrote:

> Use your web browser to hit the same URLand see what happens when you  
> save the file to disk.
> Wade
>
> ----- Original Message ----
> From: Oleg Kalnichevski <ol...@apache.org>
> To: HttpClient User Discussion <ht...@jakarta.apache.org>
> Sent: Wednesday, December 21, 2005 17:19:15
> Subject: Re: Problem retrieving XML from a URL (Newbie question)
>
> On Wed, 2005-12-21 at 16:53 -0500, James Howe wrote:
>> I've just started using HTTPClient in my application.  I have some  
>> simple
>> code which performs a GET request on a URL which returns XML.  The XML
>> contains a numeric entity definition for the copyright symbol '&#169;',
>> but when I read the response, it seems tha HTTPClient is converting it  
>> to
>> &copy;.
>
> James,
>
> HttpClient _never_ performs any content manipulation beyond simple byte
> to char conversion
>
> Oleg
>
>>
>
>>   Is there some way to configure HTTPClient to not do this?  I'm
>> retrieving the XML content and then passing it on to another process  
>> which
>> attempts to parse it and the parsing fails because the entity &copy;  
>> isn't
>> defined.  I need to have any entities defined in a numeric form  
>> maintained
>> in a numeric form.  My HTTPClient code looks something like this:
>>
>>
>>     HttpClient client = new HttpClient();
>>     GetMethod method = new GetMethod("some url which returns XML");
>>     method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, new
>> DefaultHttpMethodRetryHandler(3, false));
>>     try {
>>        int statusCode = client.executeMethod(feed.hostConfiguration(),
>> method);
>>        // ...
>>        result = method.getResponseBodyAsString();
>>     ...
>>
>> If the returned XML contains &#169;, the content returned by
>> getResponseBodyAsString contains &copy; instead.  I want the result to  
>> be
>> the same content as what was sent by the server, not processed.
>>
>> I'm using HTTPClient 3.0.
>>
>> Thanks!
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
>



-- 
James Howe

Contact: http://public.xdi.org/=James.Howe

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: Problem retrieving XML from a URL (Newbie question)

Posted by Wade Chandler <hw...@yahoo.com>.
Use your web browser to hit the same URLand see what happens when you save the file to disk.
 
 Wade

----- Original Message ----
From: Oleg Kalnichevski <ol...@apache.org>
To: HttpClient User Discussion <ht...@jakarta.apache.org>
Sent: Wednesday, December 21, 2005 17:19:15
Subject: Re: Problem retrieving XML from a URL (Newbie question)

On Wed, 2005-12-21 at 16:53 -0500, James Howe wrote:
> I've just started using HTTPClient in my application.  I have some simple  
> code which performs a GET request on a URL which returns XML.  The XML  
> contains a numeric entity definition for the copyright symbol '&#169;',  
> but when I read the response, it seems tha HTTPClient is converting it to  
> &copy;.

James,

HttpClient _never_ performs any content manipulation beyond simple byte
to char conversion

Oleg

> 

>   Is there some way to configure HTTPClient to not do this?  I'm  
> retrieving the XML content and then passing it on to another process which  
> attempts to parse it and the parsing fails because the entity &copy; isn't  
> defined.  I need to have any entities defined in a numeric form maintained  
> in a numeric form.  My HTTPClient code looks something like this:
> 
> 
>     HttpClient client = new HttpClient();
>     GetMethod method = new GetMethod("some url which returns XML");
>     method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, new  
> DefaultHttpMethodRetryHandler(3, false));
>     try {
>        int statusCode = client.executeMethod(feed.hostConfiguration(),  
> method);
>        // ...
>        result = method.getResponseBodyAsString();
>     ...
> 
> If the returned XML contains &#169;, the content returned by  
> getResponseBodyAsString contains &copy; instead.  I want the result to be  
> the same content as what was sent by the server, not processed.
> 
> I'm using HTTPClient 3.0.
> 
> Thanks!
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: Problem retrieving XML from a URL (Newbie question)

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Wed, 2005-12-21 at 16:53 -0500, James Howe wrote:
> I've just started using HTTPClient in my application.  I have some simple  
> code which performs a GET request on a URL which returns XML.  The XML  
> contains a numeric entity definition for the copyright symbol '&#169;',  
> but when I read the response, it seems tha HTTPClient is converting it to  
> &copy;.

James,

HttpClient _never_ performs any content manipulation beyond simple byte
to char conversion

Oleg

> 

>   Is there some way to configure HTTPClient to not do this?  I'm  
> retrieving the XML content and then passing it on to another process which  
> attempts to parse it and the parsing fails because the entity &copy; isn't  
> defined.  I need to have any entities defined in a numeric form maintained  
> in a numeric form.  My HTTPClient code looks something like this:
> 
> 
>     HttpClient client = new HttpClient();
>     GetMethod method = new GetMethod("some url which returns XML");
>     method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, new  
> DefaultHttpMethodRetryHandler(3, false));
>     try {
>        int statusCode = client.executeMethod(feed.hostConfiguration(),  
> method);
>        // ...
>        result = method.getResponseBodyAsString();
>     ...
> 
> If the returned XML contains &#169;, the content returned by  
> getResponseBodyAsString contains &copy; instead.  I want the result to be  
> the same content as what was sent by the server, not processed.
> 
> I'm using HTTPClient 3.0.
> 
> Thanks!
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org