You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by Rob Tice <ro...@k-int.com> on 2003/04/30 01:00:55 UTC

uri problems

Hi there 

 

I am using http client as the basis for analysis of a variety of web
pages (and a vast number).

 

I have come across several patterns which cause http client problems .

 

Many of the pages I am analysing have spaces or ‘^ ‘in the query part of
the url. I have had to change the query bit set to reflect this as
http-client was blowing up with the following. 

 

org.apache.commons.httpclient.URIException: escaped query not valid

            at
org.apache.commons.httpclient.URI.setRawQuery(URI.java:3201)

            at
org.apache.commons.httpclient.URI.setEscapedQuery(URI.java:3221)

            at
org.apache.commons.httpclient.HttpMethodBase.getURI(HttpMethodBase.java:
337)

            at
com.k_int.OpenHarvest.robot.JHarvestRobot.processHttp(JHarvestRobot.java
:408)

            at
com.k_int.OpenHarvest.robot.JHarvestRobot.processNext(JHarvestRobot.java
:108)

            at
com.k_int.OpenHarvest.robot.JHarvestRobot.run(JHarvestRobot.java:725)

 

 

 

 

This is the change I made

 

    //protected static final BitSet query = uric; this was the code

 

    protected static final BitSet query = new BitSet(256); // changed
rob

    static

    {

      query.or(uric);

      query.set('^');

      query.set(0x20);

    }

 

Over to you guys :-) what do you want to do?

 

 

Regards

 

Rob Tice

Rob.tice@k-int.com

 

 


Re: uri problems

Posted by Ortwin Glück <or...@nose.ch>.
Please provide a JUnit Test Case that shows the problem.

Odi


RE: uri problems

Posted by Rob Tice <ro...@k-int.com>.
I'll look at upgrading then

Thanks



Rob



-----Original Message-----
From: Michael Becke [mailto:becke@u.washington.edu] 
Sent: 01 May 2003 18:43
To: Commons HttpClient Project
Subject: Re: uri problems

Rob,

My example did what you are saying, minus the execute part.  I tried 
again, but executed the method before calling getURI().  Same results 
though.  It seems to be working just fine.

Mike

Rob Tice wrote:
> Hi Mike
> 
> The example that you have tried doesn't actually show my problem
> (probably because I didn't explain it properly :)).
> 
> So
> 
> I can create the method and execute it fine. But when I subsequently
use
> a call to method.getURI() (after execution of the said method) it
fails
> with an 'escaped query not valid' exception.
> 
> Regards
> 
> 
> Rob
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Michael Becke [mailto:becke@u.washington.edu] 
> Sent: 01 May 2003 14:48
> To: Commons HttpClient Project
> Subject: Re: uri problems
> 
> Rob,
> 
> I tried to reproduce this error but was not successful.  Here's what I

> tried:
> 
>      String[] uris = {
>  
>
"http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M88
> 035&sid=5MJ*B70%200HN1&p=cd",
>  
>
"http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M88
> 035&sid=5MJ*B70%200HN1&pt=ca",
>      };
> 
>      for (int i = 0; i < uris.length; i++) {
>          GetMethod get = new GetMethod(uris[i]);
>          URI uri = new URI(uris[i].toCharArray());
> 
>          System.out.println(uri.getHost());
>          System.out.println(uri.getPath());
>          System.out.println(uri.getQuery());
>          System.out.println(uri.getURI());
>          System.out.println(uri.getEscapedURI());
>          System.out.println(get.getURI());
>      }
> 
> And I received the following output:
> 
> www.nhs.uk
> /localnhsservices/gp/return_gp_surgery.asp
> pid=5MJ*M88035&sid=5MJ*B70 0HN1&p=cd
>
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 35&sid=5MJ*B70 
> 0HN1&p=cd
>
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 35&sid=5MJ*B70%200HN1&p=cd
>
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 35&sid=5MJ*B70%25200HN1&p=cd
> www.nhs.uk
> /localnhsservices/gp/return_gp_surgery.asp
> pid=5MJ*M88035&sid=5MJ*B70 0HN1&pt=ca
>
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 35&sid=5MJ*B70 
> 0HN1&pt=ca
>
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 35&sid=5MJ*B70%200HN1&pt=ca
>
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 35&sid=5MJ*B70%25200HN1&pt=ca
> 
> Have you tried this with the latest nightly build of HttpClient?
> 
> On a related note, I'm not sure that the HttpMethodBase(String) 
> constructor is handling URIs correctly.  The Javadocs indicate that
the 
> given URI should already be escaped but the contructor uses the URI 
> contructor for unescaped URIs.
> 
> Mike
> 
> Rob Tice wrote:
> 
>>Hi Mike
>>
>>Anything like this causes the exception as shown
>>
>>
> 
>
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 
>>35&sid=5MJ*B70%200HN^1&p=cd
>>
>>
>>Cheers
>>
>>Rob
>>
>>
>>-----Original Message-----
>>From: Michael Becke [mailto:becke@u.washington.edu] 
>>Sent: 01 May 2003 13:19
>>To: Commons HttpClient Project
>>Subject: Re: uri problems
>>
>>Hi Rob,
>>
>>Sorry for the slow response.  Could you give some examples of valid  
>>URIs that do not work?
>>
>>Mike
>>
>>On Tuesday, April 29, 2003, at 07:00 PM, Rob Tice wrote:
>>
>>
>>
>>>Hi there
>>>
>>>
>>>
>>>I am using http client as the basis for analysis of a variety of web
>>>pages (and a vast number).
>>>
>>>
>>>
>>>I have come across several patterns which cause http client problems
.
>>>
>>>
>>>
>>>Many of the pages I am analysing have spaces or ‘^ ‘in the query part
>>
>>
>>>of
>>>the url. I have had to change the query bit set to reflect this as
>>>http-client was blowing up with the following.
>>>
>>>
>>>
>>>org.apache.commons.httpclient.URIException: escaped query not valid
>>>
>>>           at
>>>org.apache.commons.httpclient.URI.setRawQuery(URI.java:3201)
>>>
>>>           at
>>>org.apache.commons.httpclient.URI.setEscapedQuery(URI.java:3221)
>>>
>>>           at
>>>
>>
>>
>
org.apache.commons.httpclient.HttpMethodBase.getURI(HttpMethodBase.java 
> 
>>>:
>>>337)
>>>
>>>           at
>>>
>>
>>
>
com.k_int.OpenHarvest.robot.JHarvestRobot.processHttp(JHarvestRobot.jav 
> 
>>>a
>>>:408)
>>>
>>>           at
>>>
>>
>>
>
com.k_int.OpenHarvest.robot.JHarvestRobot.processNext(JHarvestRobot.jav 
> 
>>>a
>>>:108)
>>>
>>>           at
>>>com.k_int.OpenHarvest.robot.JHarvestRobot.run(JHarvestRobot.java:725)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>This is the change I made
>>>
>>>
>>>
>>>   //protected static final BitSet query = uric; this was the code
>>>
>>>
>>>
>>>   protected static final BitSet query = new BitSet(256); // changed
>>>rob
>>>
>>>   static
>>>
>>>   {
>>>
>>>     query.or(uric);
>>>
>>>     query.set('^');
>>>
>>>     query.set(0x20);
>>>
>>>   }
>>>
>>>
>>>
>>>Over to you guys :-) what do you want to do?
>>>
>>>
>>>
>>>
>>>
>>>Regards
>>>
>>>
>>>
>>>Rob Tice
>>>
>>>Rob.tice@k-int.com
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail:
>>commons-httpclient-dev-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail:
>>commons-httpclient-dev-help@jakarta.apache.org
>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail:
> 
> commons-httpclient-dev-unsubscribe@jakarta.apache.org
> 
>>For additional commands, e-mail:
> 
> commons-httpclient-dev-help@jakarta.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> commons-httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> commons-httpclient-dev-help@jakarta.apache.org
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
commons-httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
commons-httpclient-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail:
commons-httpclient-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail:
commons-httpclient-dev-help@jakarta.apache.org





RE: uri problems

Posted by Rob Tice <ro...@k-int.com>.
Upgraded to the latest nightly build and the problem went away

Cheers

Rob



-----Original Message-----
From: Michael Becke [mailto:becke@u.washington.edu] 
Sent: 01 May 2003 18:43
To: Commons HttpClient Project
Subject: Re: uri problems

Rob,

My example did what you are saying, minus the execute part.  I tried 
again, but executed the method before calling getURI().  Same results 
though.  It seems to be working just fine.

Mike

Rob Tice wrote:
> Hi Mike
> 
> The example that you have tried doesn't actually show my problem
> (probably because I didn't explain it properly :)).
> 
> So
> 
> I can create the method and execute it fine. But when I subsequently
use
> a call to method.getURI() (after execution of the said method) it
fails
> with an 'escaped query not valid' exception.
> 
> Regards
> 
> 
> Rob
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Michael Becke [mailto:becke@u.washington.edu] 
> Sent: 01 May 2003 14:48
> To: Commons HttpClient Project
> Subject: Re: uri problems
> 
> Rob,
> 
> I tried to reproduce this error but was not successful.  Here's what I

> tried:
> 
>      String[] uris = {
>  
>
"http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M88
> 035&sid=5MJ*B70%200HN1&p=cd",
>  
>
"http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M88
> 035&sid=5MJ*B70%200HN1&pt=ca",
>      };
> 
>      for (int i = 0; i < uris.length; i++) {
>          GetMethod get = new GetMethod(uris[i]);
>          URI uri = new URI(uris[i].toCharArray());
> 
>          System.out.println(uri.getHost());
>          System.out.println(uri.getPath());
>          System.out.println(uri.getQuery());
>          System.out.println(uri.getURI());
>          System.out.println(uri.getEscapedURI());
>          System.out.println(get.getURI());
>      }
> 
> And I received the following output:
> 
> www.nhs.uk
> /localnhsservices/gp/return_gp_surgery.asp
> pid=5MJ*M88035&sid=5MJ*B70 0HN1&p=cd
>
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 35&sid=5MJ*B70 
> 0HN1&p=cd
>
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 35&sid=5MJ*B70%200HN1&p=cd
>
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 35&sid=5MJ*B70%25200HN1&p=cd
> www.nhs.uk
> /localnhsservices/gp/return_gp_surgery.asp
> pid=5MJ*M88035&sid=5MJ*B70 0HN1&pt=ca
>
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 35&sid=5MJ*B70 
> 0HN1&pt=ca
>
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 35&sid=5MJ*B70%200HN1&pt=ca
>
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 35&sid=5MJ*B70%25200HN1&pt=ca
> 
> Have you tried this with the latest nightly build of HttpClient?
> 
> On a related note, I'm not sure that the HttpMethodBase(String) 
> constructor is handling URIs correctly.  The Javadocs indicate that
the 
> given URI should already be escaped but the contructor uses the URI 
> contructor for unescaped URIs.
> 
> Mike
> 
> Rob Tice wrote:
> 
>>Hi Mike
>>
>>Anything like this causes the exception as shown
>>
>>
> 
>
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 
>>35&sid=5MJ*B70%200HN^1&p=cd
>>
>>
>>Cheers
>>
>>Rob
>>
>>
>>-----Original Message-----
>>From: Michael Becke [mailto:becke@u.washington.edu] 
>>Sent: 01 May 2003 13:19
>>To: Commons HttpClient Project
>>Subject: Re: uri problems
>>
>>Hi Rob,
>>
>>Sorry for the slow response.  Could you give some examples of valid  
>>URIs that do not work?
>>
>>Mike
>>
>>On Tuesday, April 29, 2003, at 07:00 PM, Rob Tice wrote:
>>
>>
>>
>>>Hi there
>>>
>>>
>>>
>>>I am using http client as the basis for analysis of a variety of web
>>>pages (and a vast number).
>>>
>>>
>>>
>>>I have come across several patterns which cause http client problems
.
>>>
>>>
>>>
>>>Many of the pages I am analysing have spaces or ‘^ ‘in the query part
>>
>>
>>>of
>>>the url. I have had to change the query bit set to reflect this as
>>>http-client was blowing up with the following.
>>>
>>>
>>>
>>>org.apache.commons.httpclient.URIException: escaped query not valid
>>>
>>>           at
>>>org.apache.commons.httpclient.URI.setRawQuery(URI.java:3201)
>>>
>>>           at
>>>org.apache.commons.httpclient.URI.setEscapedQuery(URI.java:3221)
>>>
>>>           at
>>>
>>
>>
>
org.apache.commons.httpclient.HttpMethodBase.getURI(HttpMethodBase.java 
> 
>>>:
>>>337)
>>>
>>>           at
>>>
>>
>>
>
com.k_int.OpenHarvest.robot.JHarvestRobot.processHttp(JHarvestRobot.jav 
> 
>>>a
>>>:408)
>>>
>>>           at
>>>
>>
>>
>
com.k_int.OpenHarvest.robot.JHarvestRobot.processNext(JHarvestRobot.jav 
> 
>>>a
>>>:108)
>>>
>>>           at
>>>com.k_int.OpenHarvest.robot.JHarvestRobot.run(JHarvestRobot.java:725)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>This is the change I made
>>>
>>>
>>>
>>>   //protected static final BitSet query = uric; this was the code
>>>
>>>
>>>
>>>   protected static final BitSet query = new BitSet(256); // changed
>>>rob
>>>
>>>   static
>>>
>>>   {
>>>
>>>     query.or(uric);
>>>
>>>     query.set('^');
>>>
>>>     query.set(0x20);
>>>
>>>   }
>>>
>>>
>>>
>>>Over to you guys :-) what do you want to do?
>>>
>>>
>>>
>>>
>>>
>>>Regards
>>>
>>>
>>>
>>>Rob Tice
>>>
>>>Rob.tice@k-int.com
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail:
>>commons-httpclient-dev-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail:
>>commons-httpclient-dev-help@jakarta.apache.org
>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail:
> 
> commons-httpclient-dev-unsubscribe@jakarta.apache.org
> 
>>For additional commands, e-mail:
> 
> commons-httpclient-dev-help@jakarta.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> commons-httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> commons-httpclient-dev-help@jakarta.apache.org
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
commons-httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
commons-httpclient-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail:
commons-httpclient-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail:
commons-httpclient-dev-help@jakarta.apache.org





Re: uri problems

Posted by Michael Becke <be...@u.washington.edu>.
Rob,

My example did what you are saying, minus the execute part.  I tried 
again, but executed the method before calling getURI().  Same results 
though.  It seems to be working just fine.

Mike

Rob Tice wrote:
> Hi Mike
> 
> The example that you have tried doesn't actually show my problem
> (probably because I didn't explain it properly :)).
> 
> So
> 
> I can create the method and execute it fine. But when I subsequently use
> a call to method.getURI() (after execution of the said method) it fails
> with an 'escaped query not valid' exception.
> 
> Regards
> 
> 
> Rob
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Michael Becke [mailto:becke@u.washington.edu] 
> Sent: 01 May 2003 14:48
> To: Commons HttpClient Project
> Subject: Re: uri problems
> 
> Rob,
> 
> I tried to reproduce this error but was not successful.  Here's what I 
> tried:
> 
>      String[] uris = {
>  
> "http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M88
> 035&sid=5MJ*B70%200HN1&p=cd",
>  
> "http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M88
> 035&sid=5MJ*B70%200HN1&pt=ca",
>      };
> 
>      for (int i = 0; i < uris.length; i++) {
>          GetMethod get = new GetMethod(uris[i]);
>          URI uri = new URI(uris[i].toCharArray());
> 
>          System.out.println(uri.getHost());
>          System.out.println(uri.getPath());
>          System.out.println(uri.getQuery());
>          System.out.println(uri.getURI());
>          System.out.println(uri.getEscapedURI());
>          System.out.println(get.getURI());
>      }
> 
> And I received the following output:
> 
> www.nhs.uk
> /localnhsservices/gp/return_gp_surgery.asp
> pid=5MJ*M88035&sid=5MJ*B70 0HN1&p=cd
> http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 35&sid=5MJ*B70 
> 0HN1&p=cd
> http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 35&sid=5MJ*B70%200HN1&p=cd
> http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 35&sid=5MJ*B70%25200HN1&p=cd
> www.nhs.uk
> /localnhsservices/gp/return_gp_surgery.asp
> pid=5MJ*M88035&sid=5MJ*B70 0HN1&pt=ca
> http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 35&sid=5MJ*B70 
> 0HN1&pt=ca
> http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 35&sid=5MJ*B70%200HN1&pt=ca
> http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 35&sid=5MJ*B70%25200HN1&pt=ca
> 
> Have you tried this with the latest nightly build of HttpClient?
> 
> On a related note, I'm not sure that the HttpMethodBase(String) 
> constructor is handling URIs correctly.  The Javadocs indicate that the 
> given URI should already be escaped but the contructor uses the URI 
> contructor for unescaped URIs.
> 
> Mike
> 
> Rob Tice wrote:
> 
>>Hi Mike
>>
>>Anything like this causes the exception as shown
>>
>>
> 
> http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 
>>35&sid=5MJ*B70%200HN^1&p=cd
>>
>>
>>Cheers
>>
>>Rob
>>
>>
>>-----Original Message-----
>>From: Michael Becke [mailto:becke@u.washington.edu] 
>>Sent: 01 May 2003 13:19
>>To: Commons HttpClient Project
>>Subject: Re: uri problems
>>
>>Hi Rob,
>>
>>Sorry for the slow response.  Could you give some examples of valid  
>>URIs that do not work?
>>
>>Mike
>>
>>On Tuesday, April 29, 2003, at 07:00 PM, Rob Tice wrote:
>>
>>
>>
>>>Hi there
>>>
>>>
>>>
>>>I am using http client as the basis for analysis of a variety of web
>>>pages (and a vast number).
>>>
>>>
>>>
>>>I have come across several patterns which cause http client problems .
>>>
>>>
>>>
>>>Many of the pages I am analysing have spaces or ‘^ ‘in the query part
>>
>>
>>>of
>>>the url. I have had to change the query bit set to reflect this as
>>>http-client was blowing up with the following.
>>>
>>>
>>>
>>>org.apache.commons.httpclient.URIException: escaped query not valid
>>>
>>>           at
>>>org.apache.commons.httpclient.URI.setRawQuery(URI.java:3201)
>>>
>>>           at
>>>org.apache.commons.httpclient.URI.setEscapedQuery(URI.java:3221)
>>>
>>>           at
>>>
>>
>>
> org.apache.commons.httpclient.HttpMethodBase.getURI(HttpMethodBase.java 
> 
>>>:
>>>337)
>>>
>>>           at
>>>
>>
>>
> com.k_int.OpenHarvest.robot.JHarvestRobot.processHttp(JHarvestRobot.jav 
> 
>>>a
>>>:408)
>>>
>>>           at
>>>
>>
>>
> com.k_int.OpenHarvest.robot.JHarvestRobot.processNext(JHarvestRobot.jav 
> 
>>>a
>>>:108)
>>>
>>>           at
>>>com.k_int.OpenHarvest.robot.JHarvestRobot.run(JHarvestRobot.java:725)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>This is the change I made
>>>
>>>
>>>
>>>   //protected static final BitSet query = uric; this was the code
>>>
>>>
>>>
>>>   protected static final BitSet query = new BitSet(256); // changed
>>>rob
>>>
>>>   static
>>>
>>>   {
>>>
>>>     query.or(uric);
>>>
>>>     query.set('^');
>>>
>>>     query.set(0x20);
>>>
>>>   }
>>>
>>>
>>>
>>>Over to you guys :-) what do you want to do?
>>>
>>>
>>>
>>>
>>>
>>>Regards
>>>
>>>
>>>
>>>Rob Tice
>>>
>>>Rob.tice@k-int.com
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail:
>>commons-httpclient-dev-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail:
>>commons-httpclient-dev-help@jakarta.apache.org
>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail:
> 
> commons-httpclient-dev-unsubscribe@jakarta.apache.org
> 
>>For additional commands, e-mail:
> 
> commons-httpclient-dev-help@jakarta.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> commons-httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> commons-httpclient-dev-help@jakarta.apache.org
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-httpclient-dev-help@jakarta.apache.org
> 


RE: uri problems

Posted by Rob Tice <ro...@k-int.com>.
Hi Mike

The example that you have tried doesn't actually show my problem
(probably because I didn't explain it properly :)).

So

I can create the method and execute it fine. But when I subsequently use
a call to method.getURI() (after execution of the said method) it fails
with an 'escaped query not valid' exception.

Regards


Rob





-----Original Message-----
From: Michael Becke [mailto:becke@u.washington.edu] 
Sent: 01 May 2003 14:48
To: Commons HttpClient Project
Subject: Re: uri problems

Rob,

I tried to reproduce this error but was not successful.  Here's what I 
tried:

     String[] uris = {
 
"http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M88
035&sid=5MJ*B70%200HN1&p=cd",
 
"http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M88
035&sid=5MJ*B70%200HN1&pt=ca",
     };

     for (int i = 0; i < uris.length; i++) {
         GetMethod get = new GetMethod(uris[i]);
         URI uri = new URI(uris[i].toCharArray());

         System.out.println(uri.getHost());
         System.out.println(uri.getPath());
         System.out.println(uri.getQuery());
         System.out.println(uri.getURI());
         System.out.println(uri.getEscapedURI());
         System.out.println(get.getURI());
     }

And I received the following output:

www.nhs.uk
/localnhsservices/gp/return_gp_surgery.asp
pid=5MJ*M88035&sid=5MJ*B70 0HN1&p=cd
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
35&sid=5MJ*B70 
0HN1&p=cd
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
35&sid=5MJ*B70%200HN1&p=cd
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
35&sid=5MJ*B70%25200HN1&p=cd
www.nhs.uk
/localnhsservices/gp/return_gp_surgery.asp
pid=5MJ*M88035&sid=5MJ*B70 0HN1&pt=ca
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
35&sid=5MJ*B70 
0HN1&pt=ca
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
35&sid=5MJ*B70%200HN1&pt=ca
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
35&sid=5MJ*B70%25200HN1&pt=ca

Have you tried this with the latest nightly build of HttpClient?

On a related note, I'm not sure that the HttpMethodBase(String) 
constructor is handling URIs correctly.  The Javadocs indicate that the 
given URI should already be escaped but the contructor uses the URI 
contructor for unescaped URIs.

Mike

Rob Tice wrote:
> Hi Mike
> 
> Anything like this causes the exception as shown
> 
>
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 35&sid=5MJ*B70%200HN^1&p=cd
> 
> 
> Cheers
> 
> Rob
> 
> 
> -----Original Message-----
> From: Michael Becke [mailto:becke@u.washington.edu] 
> Sent: 01 May 2003 13:19
> To: Commons HttpClient Project
> Subject: Re: uri problems
> 
> Hi Rob,
> 
> Sorry for the slow response.  Could you give some examples of valid  
> URIs that do not work?
> 
> Mike
> 
> On Tuesday, April 29, 2003, at 07:00 PM, Rob Tice wrote:
> 
> 
>>Hi there
>>
>>
>>
>>I am using http client as the basis for analysis of a variety of web
>>pages (and a vast number).
>>
>>
>>
>>I have come across several patterns which cause http client problems .
>>
>>
>>
>>Many of the pages I am analysing have spaces or ‘^ ‘in the query part
> 
> 
>>of
>>the url. I have had to change the query bit set to reflect this as
>>http-client was blowing up with the following.
>>
>>
>>
>>org.apache.commons.httpclient.URIException: escaped query not valid
>>
>>            at
>>org.apache.commons.httpclient.URI.setRawQuery(URI.java:3201)
>>
>>            at
>>org.apache.commons.httpclient.URI.setEscapedQuery(URI.java:3221)
>>
>>            at
>>
> 
>
org.apache.commons.httpclient.HttpMethodBase.getURI(HttpMethodBase.java 
> 
>>:
>>337)
>>
>>            at
>>
> 
>
com.k_int.OpenHarvest.robot.JHarvestRobot.processHttp(JHarvestRobot.jav 
> 
>>a
>>:408)
>>
>>            at
>>
> 
>
com.k_int.OpenHarvest.robot.JHarvestRobot.processNext(JHarvestRobot.jav 
> 
>>a
>>:108)
>>
>>            at
>>com.k_int.OpenHarvest.robot.JHarvestRobot.run(JHarvestRobot.java:725)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>This is the change I made
>>
>>
>>
>>    //protected static final BitSet query = uric; this was the code
>>
>>
>>
>>    protected static final BitSet query = new BitSet(256); // changed
>>rob
>>
>>    static
>>
>>    {
>>
>>      query.or(uric);
>>
>>      query.set('^');
>>
>>      query.set(0x20);
>>
>>    }
>>
>>
>>
>>Over to you guys :-) what do you want to do?
>>
>>
>>
>>
>>
>>Regards
>>
>>
>>
>>Rob Tice
>>
>>Rob.tice@k-int.com
>>
>>
>>
>>
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> commons-httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> commons-httpclient-dev-help@jakarta.apache.org
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
commons-httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
commons-httpclient-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail:
commons-httpclient-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail:
commons-httpclient-dev-help@jakarta.apache.org





Re: uri problems

Posted by Michael Becke <be...@u.washington.edu>.
Rob,

I tried to reproduce this error but was not successful.  Here's what I 
tried:

     String[] uris = {
 
"http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M88035&sid=5MJ*B70%200HN1&p=cd",
 
"http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M88035&sid=5MJ*B70%200HN1&pt=ca",
     };

     for (int i = 0; i < uris.length; i++) {
         GetMethod get = new GetMethod(uris[i]);
         URI uri = new URI(uris[i].toCharArray());

         System.out.println(uri.getHost());
         System.out.println(uri.getPath());
         System.out.println(uri.getQuery());
         System.out.println(uri.getURI());
         System.out.println(uri.getEscapedURI());
         System.out.println(get.getURI());
     }

And I received the following output:

www.nhs.uk
/localnhsservices/gp/return_gp_surgery.asp
pid=5MJ*M88035&sid=5MJ*B70 0HN1&p=cd
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M88035&sid=5MJ*B70 
0HN1&p=cd
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M88035&sid=5MJ*B70%200HN1&p=cd
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M88035&sid=5MJ*B70%25200HN1&p=cd
www.nhs.uk
/localnhsservices/gp/return_gp_surgery.asp
pid=5MJ*M88035&sid=5MJ*B70 0HN1&pt=ca
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M88035&sid=5MJ*B70 
0HN1&pt=ca
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M88035&sid=5MJ*B70%200HN1&pt=ca
http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M88035&sid=5MJ*B70%25200HN1&pt=ca

Have you tried this with the latest nightly build of HttpClient?

On a related note, I'm not sure that the HttpMethodBase(String) 
constructor is handling URIs correctly.  The Javadocs indicate that the 
given URI should already be escaped but the contructor uses the URI 
contructor for unescaped URIs.

Mike

Rob Tice wrote:
> Hi Mike
> 
> Anything like this causes the exception as shown
> 
> http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
> 35&sid=5MJ*B70%200HN^1&p=cd
> 
> 
> Cheers
> 
> Rob
> 
> 
> -----Original Message-----
> From: Michael Becke [mailto:becke@u.washington.edu] 
> Sent: 01 May 2003 13:19
> To: Commons HttpClient Project
> Subject: Re: uri problems
> 
> Hi Rob,
> 
> Sorry for the slow response.  Could you give some examples of valid  
> URIs that do not work?
> 
> Mike
> 
> On Tuesday, April 29, 2003, at 07:00 PM, Rob Tice wrote:
> 
> 
>>Hi there
>>
>>
>>
>>I am using http client as the basis for analysis of a variety of web
>>pages (and a vast number).
>>
>>
>>
>>I have come across several patterns which cause http client problems .
>>
>>
>>
>>Many of the pages I am analysing have spaces or ‘^ ‘in the query part
> 
> 
>>of
>>the url. I have had to change the query bit set to reflect this as
>>http-client was blowing up with the following.
>>
>>
>>
>>org.apache.commons.httpclient.URIException: escaped query not valid
>>
>>            at
>>org.apache.commons.httpclient.URI.setRawQuery(URI.java:3201)
>>
>>            at
>>org.apache.commons.httpclient.URI.setEscapedQuery(URI.java:3221)
>>
>>            at
>>
> 
> org.apache.commons.httpclient.HttpMethodBase.getURI(HttpMethodBase.java 
> 
>>:
>>337)
>>
>>            at
>>
> 
> com.k_int.OpenHarvest.robot.JHarvestRobot.processHttp(JHarvestRobot.jav 
> 
>>a
>>:408)
>>
>>            at
>>
> 
> com.k_int.OpenHarvest.robot.JHarvestRobot.processNext(JHarvestRobot.jav 
> 
>>a
>>:108)
>>
>>            at
>>com.k_int.OpenHarvest.robot.JHarvestRobot.run(JHarvestRobot.java:725)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>This is the change I made
>>
>>
>>
>>    //protected static final BitSet query = uric; this was the code
>>
>>
>>
>>    protected static final BitSet query = new BitSet(256); // changed
>>rob
>>
>>    static
>>
>>    {
>>
>>      query.or(uric);
>>
>>      query.set('^');
>>
>>      query.set(0x20);
>>
>>    }
>>
>>
>>
>>Over to you guys :-) what do you want to do?
>>
>>
>>
>>
>>
>>Regards
>>
>>
>>
>>Rob Tice
>>
>>Rob.tice@k-int.com
>>
>>
>>
>>
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> commons-httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> commons-httpclient-dev-help@jakarta.apache.org
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-httpclient-dev-help@jakarta.apache.org
> 


RE: uri problems

Posted by Rob Tice <ro...@k-int.com>.
Hi Mike

Anything like this causes the exception as shown

http://www.nhs.uk/localnhsservices/gp/return_gp_surgery.asp?pid=5MJ*M880
35&sid=5MJ*B70%200HN^1&p=cd


Cheers

Rob


-----Original Message-----
From: Michael Becke [mailto:becke@u.washington.edu] 
Sent: 01 May 2003 13:19
To: Commons HttpClient Project
Subject: Re: uri problems

Hi Rob,

Sorry for the slow response.  Could you give some examples of valid  
URIs that do not work?

Mike

On Tuesday, April 29, 2003, at 07:00 PM, Rob Tice wrote:

> Hi there
>
>
>
> I am using http client as the basis for analysis of a variety of web
> pages (and a vast number).
>
>
>
> I have come across several patterns which cause http client problems .
>
>
>
> Many of the pages I am analysing have spaces or ‘^ ‘in the query part

> of
> the url. I have had to change the query bit set to reflect this as
> http-client was blowing up with the following.
>
>
>
> org.apache.commons.httpclient.URIException: escaped query not valid
>
>             at
> org.apache.commons.httpclient.URI.setRawQuery(URI.java:3201)
>
>             at
> org.apache.commons.httpclient.URI.setEscapedQuery(URI.java:3221)
>
>             at
>
org.apache.commons.httpclient.HttpMethodBase.getURI(HttpMethodBase.java 
> :
> 337)
>
>             at
>
com.k_int.OpenHarvest.robot.JHarvestRobot.processHttp(JHarvestRobot.jav 
> a
> :408)
>
>             at
>
com.k_int.OpenHarvest.robot.JHarvestRobot.processNext(JHarvestRobot.jav 
> a
> :108)
>
>             at
> com.k_int.OpenHarvest.robot.JHarvestRobot.run(JHarvestRobot.java:725)
>
>
>
>
>
>
>
>
>
> This is the change I made
>
>
>
>     //protected static final BitSet query = uric; this was the code
>
>
>
>     protected static final BitSet query = new BitSet(256); // changed
> rob
>
>     static
>
>     {
>
>       query.or(uric);
>
>       query.set('^');
>
>       query.set(0x20);
>
>     }
>
>
>
> Over to you guys :-) what do you want to do?
>
>
>
>
>
> Regards
>
>
>
> Rob Tice
>
> Rob.tice@k-int.com
>
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail:
commons-httpclient-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail:
commons-httpclient-dev-help@jakarta.apache.org





Re: uri problems

Posted by Michael Becke <be...@u.washington.edu>.
Hi Rob,

Sorry for the slow response.  Could you give some examples of valid  
URIs that do not work?

Mike

On Tuesday, April 29, 2003, at 07:00 PM, Rob Tice wrote:

> Hi there
>
>
>
> I am using http client as the basis for analysis of a variety of web
> pages (and a vast number).
>
>
>
> I have come across several patterns which cause http client problems .
>
>
>
> Many of the pages I am analysing have spaces or ‘^ ‘in the query part  
> of
> the url. I have had to change the query bit set to reflect this as
> http-client was blowing up with the following.
>
>
>
> org.apache.commons.httpclient.URIException: escaped query not valid
>
>             at
> org.apache.commons.httpclient.URI.setRawQuery(URI.java:3201)
>
>             at
> org.apache.commons.httpclient.URI.setEscapedQuery(URI.java:3221)
>
>             at
> org.apache.commons.httpclient.HttpMethodBase.getURI(HttpMethodBase.java 
> :
> 337)
>
>             at
> com.k_int.OpenHarvest.robot.JHarvestRobot.processHttp(JHarvestRobot.jav 
> a
> :408)
>
>             at
> com.k_int.OpenHarvest.robot.JHarvestRobot.processNext(JHarvestRobot.jav 
> a
> :108)
>
>             at
> com.k_int.OpenHarvest.robot.JHarvestRobot.run(JHarvestRobot.java:725)
>
>
>
>
>
>
>
>
>
> This is the change I made
>
>
>
>     //protected static final BitSet query = uric; this was the code
>
>
>
>     protected static final BitSet query = new BitSet(256); // changed
> rob
>
>     static
>
>     {
>
>       query.or(uric);
>
>       query.set('^');
>
>       query.set(0x20);
>
>     }
>
>
>
> Over to you guys :-) what do you want to do?
>
>
>
>
>
> Regards
>
>
>
> Rob Tice
>
> Rob.tice@k-int.com
>
>
>
>
>