You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by Roland Weber <ht...@dubioso.net> on 2006/08/20 19:11:26 UTC

[HttpCore] Proxy Support

Hello all,

I've been waiting for a quiet week-end to collect my thoughts and
questions about proxy support in HttpCore, or HttpComponents in
general. Since a quiet week-end doesn't seem to come my way, I've
decided to write down what I have in mind right now, to get the
discussion going.

We have two interfaces HttpClientConnection and HttpProxyConnection,
along with a default implementation for each. Proxy is derived from
Client, both in the interface and default implementation.
I think it is a design flaw to separate plain and proxy connections.
Consider connection management: we want to create and manage a
number of connections, not knowing whether they'll be used through
a proxy or not. The proxy connection is not layered on top of a
plain connection, the class is derived from it. So we always have
to create proxy connections for the connection manager in order to
allow proxying. Then those proxy connections are used either as a
plain or as a proxy connection. Being proxied or not is a runtime
property, and can not be reflected in the class hierarchy.

Another problem is that proxying is not transparent. There is a
check in HttpRequestExecutor.doEstablishConnection whether the
connection is pointing to the correct target host. It might have
been me who introduced that method as part of some refactoring.
They way it is used, the connection always points to the correct
host because the invocation argument is taken directly from the
connection. But in general, the decision whether a connection is
pointing appropriately can not be made without knowing whether
it is proxied. If it is connected to a proxy and using a tunnel,
then both proxy and ultimate target host have to be the intended
ones. If it is proxied without (real) tunnelling, it can be kept
alive even if the next request is going to a different target host
but through the same proxy.
I think the general idea of HttpCore was that a connection would
be established to the appropriate host or proxy prior to sending
the request, so that HttpCore doesn't have to bother. But that
idea is currently broken in HttpRequestExecutor. We either have
to repair the request executor (and adapt the async processor to
those changes), or we need a different idea for proxy handling
in HttpCore.
Proxy handling is also not transparent to a connection manager,
for the keep-alive reason mentioned above. If there is a connection
open to the proxy, and not tunnelled to a specific target host
(and not associated with some inappropriate authentication state),
then that connection can be re-used for a different target host.
I believe that the connection itself would be a good place to
implement the logic for deciding whether it is pointing correctly.

A minor detail is that only the HttpProxyConnection has an
isSecure() method. A non-proxied connection can be secure, too :-)
Another minor detail is the ProxyHost class, which is not used
anywhere in the API, but only by the DefaultHttpProxyConnection.
I'm not sure whether it adds any value.

Finally, I am wondering where we'll plug in logic for proxy
selection. Before digging deeper into this, I thought we could
have a request interceptor that picks a proxy for the request.
But request interceptors are executed only after the connection
is available, and we need the information about the proxy before
requesting the connection from a connection manager. Also, there
are problems with proxy requests having a different status line
from non-proxy requests, which would be ugly to deal with in a
request interceptor.
Proxy selection will also affect HttpAsync. While it is possible
to design HttpCore so that it does not establish connections and
therefore does not have to know about a proxy, HttpAsync provides
a different interface. It is the responsibility of an HttpDispatcher
to establish connections, so those will need to know which proxy to
use. I'd like to discuss this now, so we can agree on a common
interface for HttpAsync and HttpClient.


OK, I think that's about all that has been bugging me about our
proxy support those last few months. Let me know what you think.

cheers,
  Roland

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-dev-help@jakarta.apache.org


Re: [HttpCore] Proxy Support

Posted by Roland Weber <ht...@dubioso.net>.
Hi Oleg,

>>Then we have to factor out the code to establish the connection
>>and turn the connection into a simple container for the socket,
>>which is filled from the outside.
> 
> Actually this is something we should seriously consider.

Ok, I'll ponder it for some sleepless nights or such :-)

>>Regarding the responsibilities of connections and connection managers,
>>my idea was to let the connection decide whether it is pointing to the
>>target, and if it does the connection manager (or dispatcher) can ask
>>the connection reuse strategy whether to keep it alive or not. Or
>>something along that line.
> 
> Maybe this is the right way to go for HttpAsync but I am not sure this
> is the case for HttpClient. Can you develop this code inside HttpAsync
> first and then we can decide whether it should be moved over to
> HttpCore.  

I don't insist on putting it into the connection, I just want to have
it in some place where it's reusable. I'll keep the stuff in HttpAsync
by extending some core interface or defining a new one.

> Can you look into decoupling the process of establishing a connection
> from the HttpClientConnection interface, if you happen to have some
> spare cycles left? I am stuck neck deep in my day job and NIO stuff and
> will have no time to look at HttpClientConnection until the first cut at
> HttpCore NIO extensions is complete.

I can't promise this weekend, nor next. I'm on a business trip again
next week. I'll keep thinking about it, and will come up with some
ideas eventually. But I also want to get a few lines done on HttpAsync,
especially now that somebody expressed interest :-)

cheers,
  Roland

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-dev-help@jakarta.apache.org


Re: [HttpCore] Proxy Support

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Mon, 2006-08-21 at 21:43 +0200, Roland Weber wrote:
... 
> Then we have to factor out the code to establish the connection
> and turn the connection into a simple container for the socket,
> which is filled from the outside.
> 

Actually this is something we should seriously consider. The more I
think of it the more I like this approach. The gory details of
initializing a Socket and binding it to a specific connection instance
should be left up to a object factory.

> > The special case is not connection proxying but rather connection
> > tunneling. My first knee-jerk reaction was to put all the tunneling code
> > into a separate super class
> 
> I think inheritance is the problem, not the solution here.
> 

All right. Let's try to do away with inheritance here and move this
logic into an object factory or some such.

> My problem is that I don't want to make HttpAsync dependent on a
> connection manager. If you have a look at the open issues I have
> created for HttpAsync, you'll notice that it needs a completely
> different connection management interface. I've already pushed that
> to a future release, to reduce the bulk of work to a more manageable
> size. In fact, I had a very bad day a few weeks ago when thinking
> about HttpAsync, and pushing connection management to a future
> release was the one thing that helped me get over that mood.
> 
> Regarding the responsibilities of connections and connection managers,
> my idea was to let the connection decide whether it is pointing to the
> target, and if it does the connection manager (or dispatcher) can ask
> the connection reuse strategy whether to keep it alive or not. Or
> something along that line.
> 

Maybe this is the right way to go for HttpAsync but I am not sure this
is the case for HttpClient. Can you develop this code inside HttpAsync
first and then we can decide whether it should be moved over to
HttpCore.  

> > My suggestion would be to port MTHCM to the new API, hack up a very
> > simple HttpClient prototype (no cookies, no authentication, no
> > redirects) and see if we end up with some generic aspects that may prove
> > useful in HttpAsync or HttpCore. It may be a little easier to observe
> > commonalities rather than trying to 'guest' them.
> 
> Do we need MTHCM for the proxy part?
> 

Probably not, but the MTHCM represents a real-life use case for us,
which we could use to see how well (or badly) the new proxy API fares.

Can you look into decoupling the process of establishing a connection
from the HttpClientConnection interface, if you happen to have some
spare cycles left? I am stuck neck deep in my day job and NIO stuff and
will have no time to look at HttpClientConnection until the first cut at
HttpCore NIO extensions is complete.

Cheers,

Oleg

> cheers,
>   Roland
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: httpclient-dev-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-dev-help@jakarta.apache.org


Re: [HttpCore] Proxy Support

Posted by Roland Weber <ht...@dubioso.net>.
Hi Oleg,

>> Being proxied or not is a runtime
>>property, and can not be reflected in the class hierarchy.
> 
> I am not sure I agree with that. From the RFC 2616 standpoint there is
> no difference between proxied and plain client HTTP connection.

I think this may be a rectangle vs. square problem. Mathematically,
every square is a rectangle. Still, it's not a good idea to derive
a square class from a rectangle class :-) (unless it is read-only)
RFC 2616 considers connections as something to send data over. But
our connection objects are more, since they include the logic to
establish the connection. Having connection objects that only know
how to establish a plain connection, and others that know how to
tunnel over a proxy sounds wrong to me.

> The connection itself should
> not be aware of this distinction.

Then we have to factor out the code to establish the connection
and turn the connection into a simple container for the socket,
which is filled from the outside.

> The special case is not connection proxying but rather connection
> tunneling. My first knee-jerk reaction was to put all the tunneling code
> into a separate super class

I think inheritance is the problem, not the solution here.

>>If it is proxied without (real) tunnelling, it can be kept
>>alive even if the next request is going to a different target host
>>but through the same proxy.
> 
> Presently this is one of deficiencies of HttpClient 3.x (MTHCM to be
> exact). We definitely should try to make HttpClient 4.0 a bit smarter
> about pooling proxied connections. 

OK. Will be tough though. I've taken a look at MTHCM recently.
It's massive.


>>I believe that the connection itself would be a good place to
>>implement the logic for deciding whether it is pointing correctly.
> 
> I rather lean toward keeping this kind of logic in a connection manager,
> but am open to consider alternative approaches. In my opinion the job of
> connection is to shove around HTTP messages, whereas the decision about
> re-usability of connections should be left up to a connection manager

My problem is that I don't want to make HttpAsync dependent on a
connection manager. If you have a look at the open issues I have
created for HttpAsync, you'll notice that it needs a completely
different connection management interface. I've already pushed that
to a future release, to reduce the bulk of work to a more manageable
size. In fact, I had a very bad day a few weeks ago when thinking
about HttpAsync, and pushing connection management to a future
release was the one thing that helped me get over that mood.

Regarding the responsibilities of connections and connection managers,
my idea was to let the connection decide whether it is pointing to the
target, and if it does the connection manager (or dispatcher) can ask
the connection reuse strategy whether to keep it alive or not. Or
something along that line.

>>A minor detail is that only the HttpProxyConnection has an
>>isSecure() method. A non-proxied connection can be secure, too :-)
>>Another minor detail is the ProxyHost class, which is not used
>>anywhere in the API, but only by the DefaultHttpProxyConnection.
>>I'm not sure whether it adds any value.
> 
> Let's fix it.

OK. I'll try to come up with a patch this week.


> My suggestion would be to port MTHCM to the new API, hack up a very
> simple HttpClient prototype (no cookies, no authentication, no
> redirects) and see if we end up with some generic aspects that may prove
> useful in HttpAsync or HttpCore. It may be a little easier to observe
> commonalities rather than trying to 'guest' them.

Do we need MTHCM for the proxy part?

cheers,
  Roland

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-dev-help@jakarta.apache.org


Re: [HttpCore] Proxy Support

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Sun, 2006-08-20 at 19:11 +0200, Roland Weber wrote: 
> Hello all,
> 
> I've been waiting for a quiet week-end to collect my thoughts and
> questions about proxy support in HttpCore, or HttpComponents in
> general. Since a quiet week-end doesn't seem to come my way, I've
> decided to write down what I have in mind right now, to get the
> discussion going.
> 

Hi Roland,

Many thanks for bringing this up. I agree proxy support in HttpCore
needs more work and at this point is likely to be broken. I have not
revisited the client related classes in HttpCore for quite a while as I
was mostly preoccupied with the server side stuff.

> We have two interfaces HttpClientConnection and HttpProxyConnection,
> along with a default implementation for each. Proxy is derived from
> Client, both in the interface and default implementation.
> I think it is a design flaw to separate plain and proxy connections.
> Consider connection management: we want to create and manage a
> number of connections, not knowing whether they'll be used through
> a proxy or not. The proxy connection is not layered on top of a
> plain connection, the class is derived from it. So we always have
> to create proxy connections for the connection manager in order to
> allow proxying. Then those proxy connections are used either as a
> plain or as a proxy connection. Being proxied or not is a runtime
> property, and can not be reflected in the class hierarchy.
> 

I am not sure I agree with that. From the RFC 2616 standpoint there is
no difference between proxied and plain client HTTP connection. The sole
difference is the request-URI, which must be absolute in case of
requests sent over a proxied connection. The connection itself should
not be aware of this distinction.

The special case is not connection proxying but rather connection
tunneling. My first knee-jerk reaction was to put all the tunneling code
into a separate super class

> Another problem is that proxying is not transparent. There is a
> check in HttpRequestExecutor.doEstablishConnection whether the
> connection is pointing to the correct target host. It might have
> been me who introduced that method as part of some refactoring.
> They way it is used, the connection always points to the correct
> host because the invocation argument is taken directly from the
> connection. But in general, the decision whether a connection is
> pointing appropriately can not be made without knowing whether
> it is proxied. If it is connected to a proxy and using a tunnel,
> then both proxy and ultimate target host have to be the intended
> ones. If it is proxied without (real) tunnelling, it can be kept
> alive even if the next request is going to a different target host
> but through the same proxy.

Presently this is one of deficiencies of HttpClient 3.x (MTHCM to be
exact). We definitely should try to make HttpClient 4.0 a bit smarter
about pooling proxied connections. 

> I think the general idea of HttpCore was that a connection would
> be established to the appropriate host or proxy prior to sending
> the request, so that HttpCore doesn't have to bother. But that
> idea is currently broken in HttpRequestExecutor. We either have
> to repair the request executor (and adapt the async processor to
> those changes), or we need a different idea for proxy handling
> in HttpCore.
> Proxy handling is also not transparent to a connection manager,
> for the keep-alive reason mentioned above. If there is a connection
> open to the proxy, and not tunnelled to a specific target host
> (and not associated with some inappropriate authentication state),
> then that connection can be re-used for a different target host.
> I believe that the connection itself would be a good place to
> implement the logic for deciding whether it is pointing correctly.
> 

I rather lean toward keeping this kind of logic in a connection manager,
but am open to consider alternative approaches. In my opinion the job of
connection is to shove around HTTP messages, whereas the decision about
re-usability of connections should be left up to a connection manager

> A minor detail is that only the HttpProxyConnection has an
> isSecure() method. A non-proxied connection can be secure, too :-)
> Another minor detail is the ProxyHost class, which is not used
> anywhere in the API, but only by the DefaultHttpProxyConnection.
> I'm not sure whether it adds any value.
> 

Let's fix it.

> Finally, I am wondering where we'll plug in logic for proxy
> selection. Before digging deeper into this, I thought we could
> have a request interceptor that picks a proxy for the request.
> But request interceptors are executed only after the connection
> is available, and we need the information about the proxy before
> requesting the connection from a connection manager. Also, there
> are problems with proxy requests having a different status line
> from non-proxy requests, which would be ugly to deal with in a
> request interceptor.
> Proxy selection will also affect HttpAsync. While it is possible
> to design HttpCore so that it does not establish connections and
> therefore does not have to know about a proxy, HttpAsync provides
> a different interface. It is the responsibility of an HttpDispatcher
> to establish connections, so those will need to know which proxy to
> use. I'd like to discuss this now, so we can agree on a common
> interface for HttpAsync and HttpClient.
> 

My suggestion would be to port MTHCM to the new API, hack up a very
simple HttpClient prototype (no cookies, no authentication, no
redirects) and see if we end up with some generic aspects that may prove
useful in HttpAsync or HttpCore. It may be a little easier to observe
commonalities rather than trying to 'guest' them.

Oleg

> 
> OK, I think that's about all that has been bugging me about our
> proxy support those last few months. Let me know what you think.
> 
> cheers,
>   Roland
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: httpclient-dev-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-dev-help@jakarta.apache.org