You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Marc Saegesser <Ma...@apropos.com> on 2002/02/12 17:50:41 UTC

[httpclient] Lots of patches and discussion

Please excuse this really long post, but there's a lot to cover.

First, let me introduce myself.  My name is Marc Saegeser and I've been a
committer on the Jakarta-Tomcat project for over a year.  I was the release
manager for the Tomcat 3.2.2-3.2.4 releases.  I'm not currently a committer
on any Jakarta-Commons projects.

Now a question.  What is the status of the HttpClient 2.0 release?  The code
is currently tagged alpha 1 but the RELEASE_PLAN_2_0.txt document hasn't
been modified since October, 2001.  I ask because, depending on how iminent
an actual release is, some of the changes that I'm proposing should probably
be made on a separate branch.  

Here's my story.  I have need of something like HttpClient in my product but
I found that I had to extend it somewhat.  The extensions are very generic
and I believe useful to others so I'd like to add to the HttpClient project.
I also found several bugs that I fixed along the way.  I've documented these
changes below.

I need to be able to use HttpClient (or a derivative) to navigate around the
web pretty much like a regular user-agent.  I want to be able to access any
site and any web application that I can reach with a reasonably modern
browser.  HttpClient does a good job of implementing the client side of RFC
2616.  Unfortunately, there are lots of sites and some very big name
applications that do not implement the server side correctly.  Some sites
(Yahoo! in particular) actually require a broken client implementation just
to log in.  Here are two examples of things I've found so far.
RFC2616/10.3.3 forbids changing a 302 redirected POST method into a GET
method but acknowledges that most clients are broken in this regard (this is
the failure that Yahoo! requires).  I have found sites that send relative
URLs in the Location: header of a redirect (this violates RFC2616/14.30).
Supporting these sites will require 'breaking' HttpClient.  I propose adding
some kind of flag to put HttpClient into a 'compatability mode' that
impelements this and any other required broken behaviour.

A second need is to provide a mechanism for getting user acknowledgment for
certain actions.  For exampe when redirecting from secure to non-secure
sites.

I am going to start working on these changes next but I want to discuss them
with the HttpClient community so see if they feel they belong in the commons
HttpClient project or if the project should be forked.

Anyway, below is a description of the modified and new files.  The patches
and new files are attached.

Modified files...

Cookie.java
  -  Added support for old Netscape cookies.  The biggest difference is that
the test for valid domains is different for Netscape cookies and RFC 2109
cookies.
  -  Added space after the semicolons separating the values.  This is
required by sites that only implement the old Netscape cookie specification.
  -  Added additional date format for expiration times.

HttpConnection.java
  -  The write*() and print*() methods now throw HttpRecoverableException.  

HttpMethodBase.java
  -  Added a new exception class, HttpRecoverableException.  There are some
error conditions that we can try to recover from internally.  The biggest
one I found was when a server unexepectedly closed the socket.  In this case
we should just try to re-open the connection and try the request again.
  -  Fixed a problem with the handling of 100 status codes.  If we get a 100
after we've already sent the request body, RFC 2616 states that the response
should be ignored.  The currently implementation incorrectly broke out of
the loop looking for the response.
  -  Always recreate the cookie header.  A redirect response may have
included additional cookies that we need to send with the redirected request
and the path may have changed thus requiring a different cookie set.
  -  Fixed readRequestBody implementation.  A new version of this function
also takes an output stream.  This makes it easier for subclasses to use
this implementation directly instead of having to re-implement it in order
to support things like saving the response to a file.
  -  Better support for responses that don't contain a Content-Length or
Transfer-Encoding header.  By the specification, if these headers are both
absent, the response has no body content.  In the real world what this means
is that the server probably didn't know the length when the response was
committed.  It just sends the response and closes the connection when the
body is complete.  This assumption falls apart when we get a response that
*can not* contain a body.  In this case, the simple implemenation keeps
reading looking for a response body and actually ends up reading the next
response headers as the body.  I've added a list of responses that,
according to the specification, can not ever have a body and fixed
readResponseBody() to not read a body for these responses.

URIUtil.java
  -  Added getPath() method.  This method returns the path portion of a
given URL.  The only difference from java.net.URL.getPath() is that this
method returns "/" if the URLs path is empty.

GetMethod.java
  -  Switched to new HttpMethodBase.readResponseBody().

New files...

HttpMultiClient.java
  -  Replacement for HttpClient.  This class serves two purposes.  First it
handles off-site redirects.  Second, it is intended to be used within a
multithreaded application that, like a browser, may have more than one
request outstanding to a given server and have requests going to more than
one server.
  -  Since HttpMultiClient, unlike HttpClient, simultaneously handles
requesets for multiple servers it can't use HttpMethod classes directly
because they only include path information, not server information.  A new
interface, HttpUtlMethod, is used that extends HttpMethod.

HttpSharedState.java
  -  A simple wrapper around HttpState to synchronized access to data.  This
is required to support the multi-threaded nature of HttpMultiClient.

HttpConnectionManager.java
  -  This is actually the heart of HttpMultiClient.  It keeps track of
available HttpConnections for host:port combinations.  The number of
connections to a given host:port is limited (per RFC 2616) and if the limit
is reached calls to getConnection() will block until a connection becomes
available.

HttpRecoverableException.java
  -  Extends HttpException.  This exception is thrown when a potentially
recoverable error has occurred (e.g. a socket connection was closed
unexpectedly).  Higher level code can attempt to try the operation again.

HttpUrlMethod.java
  -  An interface that extends HttpMethod.  HttpUrlMethod classes are
initialized with a fully qualified URL instead of just the path component.

UrlGetMethod.java
UrlPostMethod.java
UrlDeleteMethod.java
UrlOptionsMethod.java
UrlPutMethod.java
  -  These classes exetend their respective method classes and impelement
HttpUrlMethod.

Marc Saegesser 
                                


Re: [httpclient] Lots of patches and discussion

Posted by dIon Gillard <di...@multitask.com.au>.
Marc Saegesser wrote:

[snip]

>Now a question.  What is the status of the HttpClient 2.0 release?  The code
>is currently tagged alpha 1 but the RELEASE_PLAN_2_0.txt document hasn't
>been modified since October, 2001.  I ask because, depending on how iminent
>an actual release is, some of the changes that I'm proposing should probably
>be made on a separate branch.  
>
It's waiting on the committer's being comfortable that it's ready. I've 
been doing mainly maintenance on httpclient recently, so i'm not the 
best one to decide when it's ready to go.

>Here's my story.  I have need of something like HttpClient in my product but
>I found that I had to extend it somewhat.  The extensions are very generic
>and I believe useful to others so I'd like to add to the HttpClient project.
>I also found several bugs that I fixed along the way.  I've documented these
>changes below.
>
Cool.

>I need to be able to use HttpClient (or a derivative) to navigate around the
>web pretty much like a regular user-agent.  I want to be able to access any
>site and any web application that I can reach with a reasonably modern
>browser.  HttpClient does a good job of implementing the client side of RFC
>2616.  Unfortunately, there are lots of sites and some very big name
>applications that do not implement the server side correctly.  Some sites
>(Yahoo! in particular) actually require a broken client implementation just
>to log in.  Here are two examples of things I've found so far.
>RFC2616/10.3.3 forbids changing a 302 redirected POST method into a GET
>method but acknowledges that most clients are broken in this regard (this is
>the failure that Yahoo! requires).  I have found sites that send relative
>URLs in the Location: header of a redirect (this violates RFC2616/14.30).
>Supporting these sites will require 'breaking' HttpClient.  I propose adding
>some kind of flag to put HttpClient into a 'compatability mode' that
>impelements this and any other required broken behaviour.
>
This sounds like a great idea

>A second need is to provide a mechanism for getting user acknowledgment for
>certain actions.  For exampe when redirecting from secure to non-secure
>sites.
>
>I am going to start working on these changes next but I want to discuss them
>with the HttpClient community so see if they feel they belong in the commons
>HttpClient project or if the project should be forked.
>
You've emailed the development community. I'm not sure many of the 
'user' community hang out here. My preference in this one is that it 
belongs in httpclient as a strict vs relaxed mode.

>Anyway, below is a description of the modified and new files.  The patches
>and new files are attached.
>
>Modified files...
>
>Cookie.java
>  -  Added support for old Netscape cookies.  The biggest difference is that
>the test for valid domains is different for Netscape cookies and RFC 2109
>cookies
>  -  Added space after the semicolons separating the values.  This is
>required by sites that only implement the old Netscape cookie specification.
>  -  Added additional date format for expiration times.
>
>HttpConnection.java
>  -  The write*() and print*() methods now throw HttpRecoverableException.  
>
>HttpMethodBase.java
>  -  Added a new exception class, HttpRecoverableException.  There are some
>error conditions that we can try to recover from internally.  The biggest
>one I found was when a server unexepectedly closed the socket.  In this case
>we should just try to re-open the connection and try the request again.
>  -  Fixed a problem with the handling of 100 status codes.  If we get a 100
>after we've already sent the request body, RFC 2616 states that the response
>should be ignored.  The currently implementation incorrectly broke out of
>the loop looking for the response.
>
This last one sounds like a bug that should be fixed anyway.

>
>  -  Always recreate the cookie header.  A redirect response may have
>included additional cookies that we need to send with the redirected request
>and the path may have changed thus requiring a different cookie set.
>
Ditto.

>
>  -  Fixed readRequestBody implementation.  A new version of this function
>also takes an output stream.  This makes it easier for subclasses to use
>this implementation directly instead of having to re-implement it in order
>to support things like saving the response to a file.
>  -  Better support for responses that don't contain a Content-Length or
>Transfer-Encoding header.  By the specification, if these headers are both
>absent, the response has no body content.  In the real world what this means
>is that the server probably didn't know the length when the response was
>committed.  It just sends the response and closes the connection when the
>body is complete.  This assumption falls apart when we get a response that
>*can not* contain a body.  In this case, the simple implemenation keeps
>reading looking for a response body and actually ends up reading the next
>response headers as the body.  I've added a list of responses that,
>according to the specification, can not ever have a body and fixed
>readResponseBody() to not read a body for these responses.
>
Again, sounds like another bug.

>URIUtil.java
>  -  Added getPath() method.  This method returns the path portion of a
>given URL.  The only difference from java.net.URL.getPath() is that this
>method returns "/" if the URLs path is empty.
>
>GetMethod.java
>  -  Switched to new HttpMethodBase.readResponseBody().
>
>New files...
>
>HttpMultiClient.java
>  -  Replacement for HttpClient.  This class serves two purposes.  First it
>handles off-site redirects.  Second, it is intended to be used within a
>multithreaded application that, like a browser, may have more than one
>request outstanding to a given server and have requests going to more than
>one server.
>  -  Since HttpMultiClient, unlike HttpClient, simultaneously handles
>requesets for multiple servers it can't use HttpMethod classes directly
>because they only include path information, not server information.  A new
>interface, HttpUtlMethod, is used that extends HttpMethod.
>
>HttpSharedState.java
>  -  A simple wrapper around HttpState to synchronized access to data.  This
>is required to support the multi-threaded nature of HttpMultiClient.
>
>HttpConnectionManager.java
>  -  This is actually the heart of HttpMultiClient.  It keeps track of
>available HttpConnections for host:port combinations.  The number of
>connections to a given host:port is limited (per RFC 2616) and if the limit
>is reached calls to getConnection() will block until a connection becomes
>available.
>
>HttpRecoverableException.java
>  -  Extends HttpException.  This exception is thrown when a potentially
>recoverable error has occurred (e.g. a socket connection was closed
>unexpectedly).  Higher level code can attempt to try the operation again.
>
>HttpUrlMethod.java
>  -  An interface that extends HttpMethod.  HttpUrlMethod classes are
>initialized with a fully qualified URL instead of just the path component.
>
>UrlGetMethod.java
>UrlPostMethod.java
>UrlDeleteMethod.java
>UrlOptionsMethod.java
>UrlPutMethod.java
>  -  These classes exetend their respective method classes and impelement
>HttpUrlMethod.
>
>Marc Saegesser 
>
These all sound like good additions. What I think we need to work out is 
how do we turn this on or off?

-- 
dIon Gillard, Multitask Consulting
http://www.multitask.com.au/developers




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: [httpclient] Lots of patches and discussion

Posted by Remy Maucherat <re...@apache.org>.
> Please excuse this really long post, but there's a lot to cover.
>
> First, let me introduce myself.  My name is Marc Saegeser and I've been a
> committer on the Jakarta-Tomcat project for over a year.  I was the
release
> manager for the Tomcat 3.2.2-3.2.4 releases.  I'm not currently a
committer
> on any Jakarta-Commons projects.

Welcome :)

> Now a question.  What is the status of the HttpClient 2.0 release?  The
code
> is currently tagged alpha 1 but the RELEASE_PLAN_2_0.txt document hasn't
> been modified since October, 2001.  I ask because, depending on how
iminent
> an actual release is, some of the changes that I'm proposing should
probably
> be made on a separate branch.

Maybe Rodney could comment on that.

> Here's my story.  I have need of something like HttpClient in my product
but
> I found that I had to extend it somewhat.  The extensions are very generic
> and I believe useful to others so I'd like to add to the HttpClient
project.
> I also found several bugs that I fixed along the way.  I've documented
these
> changes below.
>
> I need to be able to use HttpClient (or a derivative) to navigate around
the
> web pretty much like a regular user-agent.  I want to be able to access
any
> site and any web application that I can reach with a reasonably modern
> browser.  HttpClient does a good job of implementing the client side of
RFC
> 2616.  Unfortunately, there are lots of sites and some very big name
> applications that do not implement the server side correctly.  Some sites
> (Yahoo! in particular) actually require a broken client implementation
just
> to log in.  Here are two examples of things I've found so far.
> RFC2616/10.3.3 forbids changing a 302 redirected POST method into a GET
> method but acknowledges that most clients are broken in this regard (this
is
> the failure that Yahoo! requires).  I have found sites that send relative
> URLs in the Location: header of a redirect (this violates RFC2616/14.30).
> Supporting these sites will require 'breaking' HttpClient.  I propose
adding
> some kind of flag to put HttpClient into a 'compatability mode' that
> impelements this and any other required broken behaviour.

That sounds reasonable.

> A second need is to provide a mechanism for getting user acknowledgment
for
> certain actions.  For exampe when redirecting from secure to non-secure
> sites.
>
> I am going to start working on these changes next but I want to discuss
them
> with the HttpClient community so see if they feel they belong in the
commons
> HttpClient project or if the project should be forked.
>
> Anyway, below is a description of the modified and new files.  The patches
> and new files are attached.
>
> Modified files...
>
> Cookie.java
>   -  Added support for old Netscape cookies.  The biggest difference is
that
> the test for valid domains is different for Netscape cookies and RFC 2109
> cookies.
>   -  Added space after the semicolons separating the values.  This is
> required by sites that only implement the old Netscape cookie
specification.
>   -  Added additional date format for expiration times.
>
> HttpConnection.java
>   -  The write*() and print*() methods now throw HttpRecoverableException.
>
> HttpMethodBase.java
>   -  Added a new exception class, HttpRecoverableException.  There are
some
> error conditions that we can try to recover from internally.  The biggest
> one I found was when a server unexepectedly closed the socket.  In this
case
> we should just try to re-open the connection and try the request again.
>   -  Fixed a problem with the handling of 100 status codes.  If we get a
100
> after we've already sent the request body, RFC 2616 states that the
response
> should be ignored.  The currently implementation incorrectly broke out of
> the loop looking for the response.
>   -  Always recreate the cookie header.  A redirect response may have
> included additional cookies that we need to send with the redirected
request
> and the path may have changed thus requiring a different cookie set.
>   -  Fixed readRequestBody implementation.  A new version of this function
> also takes an output stream.  This makes it easier for subclasses to use
> this implementation directly instead of having to re-implement it in order
> to support things like saving the response to a file.
>   -  Better support for responses that don't contain a Content-Length or
> Transfer-Encoding header.  By the specification, if these headers are both
> absent, the response has no body content.  In the real world what this
means
> is that the server probably didn't know the length when the response was
> committed.  It just sends the response and closes the connection when the
> body is complete.  This assumption falls apart when we get a response that
> *can not* contain a body.  In this case, the simple implemenation keeps
> reading looking for a response body and actually ends up reading the next
> response headers as the body.  I've added a list of responses that,
> according to the specification, can not ever have a body and fixed
> readResponseBody() to not read a body for these responses.
>
> URIUtil.java
>   -  Added getPath() method.  This method returns the path portion of a
> given URL.  The only difference from java.net.URL.getPath() is that this
> method returns "/" if the URLs path is empty.
>
> GetMethod.java
>   -  Switched to new HttpMethodBase.readResponseBody().
>
> New files...
>
> HttpMultiClient.java
>   -  Replacement for HttpClient.  This class serves two purposes.  First
it
> handles off-site redirects.  Second, it is intended to be used within a
> multithreaded application that, like a browser, may have more than one
> request outstanding to a given server and have requests going to more than
> one server.
>   -  Since HttpMultiClient, unlike HttpClient, simultaneously handles
> requesets for multiple servers it can't use HttpMethod classes directly
> because they only include path information, not server information.  A new
> interface, HttpUtlMethod, is used that extends HttpMethod.
>
> HttpSharedState.java
>   -  A simple wrapper around HttpState to synchronized access to data.
This
> is required to support the multi-threaded nature of HttpMultiClient.
>
> HttpConnectionManager.java
>   -  This is actually the heart of HttpMultiClient.  It keeps track of
> available HttpConnections for host:port combinations.  The number of
> connections to a given host:port is limited (per RFC 2616) and if the
limit
> is reached calls to getConnection() will block until a connection becomes
> available.
>
> HttpRecoverableException.java
>   -  Extends HttpException.  This exception is thrown when a potentially
> recoverable error has occurred (e.g. a socket connection was closed
> unexpectedly).  Higher level code can attempt to try the operation again.
>
> HttpUrlMethod.java
>   -  An interface that extends HttpMethod.  HttpUrlMethod classes are
> initialized with a fully qualified URL instead of just the path component.
>
> UrlGetMethod.java
> UrlPostMethod.java
> UrlDeleteMethod.java
> UrlOptionsMethod.java
> UrlPutMethod.java
>   -  These classes exetend their respective method classes and impelement
> HttpUrlMethod.

>From my point of view, these changes are fine as they don't seem to modify
the API too much (and if they did, that wouldn't be a big problem to me, as
I'm still using the HTTP client 1.0), and add some useful functionality.
I would be ok directly modifying HttpMethod, but I definitely could
understand if some didn't agree.

Remy


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>