You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Eugeny N Dzhurinsky <bo...@redwerk.com> on 2006/07/13 14:43:28 UTC

URI unicity

I have a requirement to avoid dupolicated URIs in database.
For example, URLs like

http://somedomain.com/path1?query1 and http://somedomain.com/path2?query2
should be considered as same url, but

http://somedomain.com/~username1 and http://somedomain.com/~username2 are not,

and finally

http://somedomain.com/~username1/path1 and
http://somedomain.com/~username1/path2?query should be considered as same.

Is there any way I can perform this validation using URI class?

-- 
Eugene N Dzhurinsky

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: http response question

Posted by Roland Weber <ht...@dubioso.net>.
Hello Ole,

> I am POST:ing to a gSoap server which returns the following  http response:
> 
> Status: 200 OK\r\n
> Server: gSOAP/2.7\r\n
> Content-Type: text/xml; charset=utf-8\r\n
> Content-Length: 1361\r\n
> Connection: close\r\n \r\n

This is not an HTTP response. All HTTP responses start with HTTP/,
followed by the protocol version number. The status line must be
one of the following:

HTTP/1.1 200 OK
HTTP/1.0 200 OK

but surely not "Status:" anything.

> is there any setting in httpclient which enables
> it to handle the above response?

No.

cheers,
  Roland

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


http response question

Posted by Ole Matzura <ol...@eviware.com>.
Hi all!

I am POST:ing to a gSoap server which returns the following  http response:

Status: 200 OK\r\n 
Server: gSOAP/2.7\r\n 
Content-Type: text/xml; charset=utf-8\r\n 
Content-Length: 1361\r\n 
Connection: close\r\n 
\r\n

which results in

org.apache.commons.httpclient.ProtocolException: The server 10.203.133.2 failed
to respond with a valid HTTP response

(no status line!?)

is this indeed invalid? is there any setting in httpclient which enables 
it to handle the above response?

thanks for any help !

regards,

/Ole
eviware.com


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: URI unicity

Posted by Roland Weber <ht...@dubioso.net>.
Hello Eugeny,

> No, I didn't. Could you please explain (or point me to documentation where it
> is explained) what exactly authority is?

These might help:
http://www.ietf.org/rfc/rfc2396.txt
http://wiki.apache.org/jakarta-httpclient/ReferenceMaterials

There is also the source code of the URI class,
though I recommend to stay away from it.

Or you just create a really complex URI, like
http://uid:pwd@hostname:portnumber/path/and?query=with#anchor
and see what you get.

cheers,
  Roland


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: URI unicity

Posted by Eugeny N Dzhurinsky <bo...@redwerk.com>.
On Fri, Jul 14, 2006 at 06:55:09PM +0200, Roland Weber wrote:
> Hello Eugeny,
> 
> > Actually I did in this way, but I didn't find a way how I can set host in URI.
> 
> That's a question of terminology. Have you tried to use get/setAuthority?
> Because the authority may also include a port number, for example.

No, I didn't. Could you please explain (or point me to documentation where it
is explained) what exactly authority is?

-- 
Eugene N Dzhurinsky

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: URI unicity

Posted by Roland Weber <ht...@dubioso.net>.
Hello Eugeny,

> Actually I did in this way, but I didn't find a way how I can set host in URI.

That's a question of terminology. Have you tried to use get/setAuthority?
Because the authority may also include a port number, for example.

cheers,
  Roland

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: URI unicity

Posted by Eugeny N Dzhurinsky <bo...@redwerk.com>.
On Thu, Jul 13, 2006 at 06:33:03PM +0200, Roland Weber wrote:
> Hello Eugeny,
> 
> > http://somedomain.com/path1?query1 and http://somedomain.com/path2?query2
> > should be considered as same url, but
> why should /path1 be considered the same as /path2?
> It doesn't make sense to me, but it doesn't have to.

Actually I need to provide a way to avoid users provide URLs which are
pointing to the same website.

> > http://somedomain.com/~username1 and http://somedomain.com/~username2 are not,
> > 
> > and finally
> > 
> > http://somedomain.com/~username1/path1 and
> > http://somedomain.com/~username1/path2?query should be considered as same.
> > 
> > Is there any way I can perform this validation using URI class?
> 
> If you expect the URI class to figure out what anybody might consider
> same or different, the answer is no. 

Sure I don't expect this ;)

> If you understand that it is your
> application that needs to do the comparison and the URI class is used
> only to hold the URIs, the answer is yes.
> What you have to do is to pick everything that is relevant for the
> comparison, and remove everything that should be ignored. Such a step
> is called normalization. Once you have your URLs normalized, you can
> then use plain .equals() to check whether they are equal.

Actually I did in this way, but I didn't find a way how I can set host in URI.
For instance, I needed to remove "^www." if exists in URI, there is method
getHost(), but no setHost(). So I finished with comparing strings created like
this

url.getSchema()+"://"+updateHost+url.getPath()

And I don't like this approach.

-- 
Eugene N Dzhurinsky

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: URI unicity

Posted by Roland Weber <ht...@dubioso.net>.
Hello Eugeny,

> http://somedomain.com/path1?query1 and http://somedomain.com/path2?query2
> should be considered as same url, but

why should /path1 be considered the same as /path2?
It doesn't make sense to me, but it doesn't have to.

> http://somedomain.com/~username1 and http://somedomain.com/~username2 are not,
> 
> and finally
> 
> http://somedomain.com/~username1/path1 and
> http://somedomain.com/~username1/path2?query should be considered as same.
> 
> Is there any way I can perform this validation using URI class?

If you expect the URI class to figure out what anybody might consider
same or different, the answer is no. If you understand that it is your
application that needs to do the comparison and the URI class is used
only to hold the URIs, the answer is yes.
What you have to do is to pick everything that is relevant for the
comparison, and remove everything that should be ignored. Such a step
is called normalization. Once you have your URLs normalized, you can
then use plain .equals() to check whether they are equal.

cheers,
  Roland

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org