You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Eugeny N Dzhurinsky <bo...@redwerk.com> on 2006/06/15 17:21:52 UTC

URL escaping

Hi there!
I facing some weird problems with URL encoding. For some reason this URL:

http://www.raresplendors.com/images/Fuchsia%20Paua%20Cufflinks%20+.jpg

Results as invalid escaped url when I trying to use it in GetMethod.

Is there any way to force httpclient decide is URL escaped or not?
-- 
Eugene N Dzhurinsky

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: URL escaping

Posted by Roland Weber <ht...@dubioso.net>.
bofh@redwerk.com wrote:

> but 'wget' can work with both escaped and unescaped urls, as well as
> firefox/mozilla/internet explorer. May be it is possible to know?

As Tatu pointed out correctly: it is possible to guess. But
HttpClient is not a browser, nor a full-blown user agent like wget.
http://wiki.apache.org/jakarta-httpclient/ForAbsoluteBeginners#head-a110969063be34fcd964aeba55ae23bea12ac232

HttpClient is an HTTP communication library you can use to
implement these kinds of applications. If anyone needs
guessing about the URLs they are using, please implement
the guesswork in the application.

cheers,
  Roland



---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: URL escaping

Posted by Tatu Saloranta <co...@yahoo.com>.
--- bofh@redwerk.com wrote:

> On Thu, Jun 15, 2006 at 06:00:25PM +0200, Roland
> Weber wrote:
> > Hello Eugeny,
> > > But what if I don't know is URI escaped or not?
> > If you don't know it, how is HttpClient supposed
> to know?
> 
> but 'wget' can work with both escaped and unescaped
> urls, as well as
> firefox/mozilla/internet explorer. May be it is
> possible to know?

It is probably not possible to know for sure, but it
is always possible to guess. ;-)
Browsers are famous/notorious for guessing what broken
things (like invalid HTML) most likely mean. Wget (and
browsers) probably also use some simple heuristics to
guess which one it is. But such logics does not (IMO)
belong to low-level libraries.

-+ Tatu +-


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: URL escaping

Posted by bo...@redwerk.com.
On Thu, Jun 15, 2006 at 06:00:25PM +0200, Roland Weber wrote:
> Hello Eugeny,
> > But what if I don't know is URI escaped or not?
> If you don't know it, how is HttpClient supposed to know?

but 'wget' can work with both escaped and unescaped urls, as well as
firefox/mozilla/internet explorer. May be it is possible to know?

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: URL escaping

Posted by Roland Weber <ht...@dubioso.net>.
Hello Eugeny,

> But what if I don't know is URI escaped or not?

If you don't know it, how is HttpClient supposed to know?

Solutions ("xor", not "and"!):

1. Ask whomever it is you get the URL from for a specific format,
   either escaped or unescaped.

2. Try to unescape the URL, even if it may not be escaped.

Option 1 will give you more reliable results. Option 2 is a
best-guess approach that should work in most situations.

hope that helps,
  Roland

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: URL escaping

Posted by Eugeny N Dzhurinsky <bo...@redwerk.com>.
On Thu, Jun 15, 2006 at 05:29:47PM +0200, Roland Weber wrote:
> Use the no-argument constructor of GetMethod, followed by setURI:
> http://jakarta.apache.org/commons/httpclient/apidocs/org/apache/commons/httpclient/HttpMethodBase.html#setURI(org.apache.commons.httpclient.URI)
> 
> The URI class has plenty of constructors to choose from, some of them
> expect the URL to be escaped, others will do the escaping for you.
> 
> hope that helps,

But what if I don't know is URI escaped or not?
I checked api docs and found I need to explicitly specify whether URI is
escaped or not.

-- 
Eugene N Dzhurinsky

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: URL escaping

Posted by Roland Weber <ht...@dubioso.net>.
Hi Eugeny,
Eugeny N Dzhurinsky wrote:
> Hi there!
> I facing some weird problems with URL encoding. For some reason this URL:
> 
> http://www.raresplendors.com/images/Fuchsia%20Paua%20Cufflinks%20+.jpg
> 
> Results as invalid escaped url when I trying to use it in GetMethod.
> 
> Is there any way to force httpclient decide is URL escaped or not?

Use the no-argument constructor of GetMethod, followed by setURI:
http://jakarta.apache.org/commons/httpclient/apidocs/org/apache/commons/httpclient/HttpMethodBase.html#setURI(org.apache.commons.httpclient.URI)

The URI class has plenty of constructors to choose from, some of them
expect the URL to be escaped, others will do the escaping for you.

hope that helps,
  Roland


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org