You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hc.apache.org by Oleg Kalnichevski <ol...@apache.org> on 2008/02/12 21:45:58 UTC

Re: Use cases of DefaultConnectingIOReactor & NHttpClientHandlerBase w/ HttpRequestExecutionHandler?

On Tue, 2008-02-12 at 15:24 -0500, Sam Berlin wrote:
> Whew!  Long subject line...
> 
> I've been looking into exactly how all these components interact, and
> it got me to wondering: how exactly are people using these objects (if
> anyone is) right now? 

Hi Sam

I suspect none really uses HttpCore NIO for pure client side HTTP stuff.
It is mostly being used to implement HTTP proxies 

>  The example NHttpClient is very simplistic in
> that it just sends a basic GET request for "/" to the three hosts and
> accepts any response.  Trying to turn the sample into a somewhat more
> real-world scenario (say, a crawler) would seem to involve placing a
> lot more information in the context of each request/response &
> attachments.  For instance, expanding the example into a crawler would
> require:
> 
>  1) On handleResponse, it parses the body for more links and adds them
> as potential outgoing requests in the context.
>  2) handleResponse somehow (?) 

IOControl#requestOutput will eventually cause the
NHttpClientHandler#requestReady to fire, which in its turn will invoke
HttpRequestExecutionHandler#submitRequest.

> triggers another submitRequest to be
> called with the right context.
>  3) submitRequest looks up the new context information and submits
> more requests.
>  4) submitRequest could limit the number of pipelined attempts to a
> given host by storing more data in the context and
> incrementing/decrementing the concurrent attempts, which
> handleResponse would need to manage.
> 
> It becomes a little harder to make it work if you want to use a
> variable number of connections and share the context.  I imagine there
> would need to be some kind of CrawlContext that's shared as the
> attachment among multiple connects and used within
> handleResponse/submitRequest.
> 
> So...  my question is, is this line of thought correct?  Am I thinking
> about the ConnectingReactor & ExecutionHandler the wrong way?

I think this line of thought is correct. However, several major pieces
of functionality are still missing, namely a connection manager and a
pipelining capable NHttpClientHandler, in order for the NIO based
components to be of any use on the client side.

I am planning to start working on the NIO connection management code
immediately after the alpha3 release.   


>   Should
> there be some sort of more intricate tie-in between the request & the
> response it generates?  (And, the built-in question, how can
> handleResponse trigger another submitRequest to be called.)
> 

IOControl#requestOutput is your friend

Hope this helps

Oleg

> Thanks!
> 
> Sam
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> For additional commands, e-mail: dev-help@hc.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org

Re: Use cases of DefaultConnectingIOReactor & NHttpClientHandlerBase w/ HttpRequestExecutionHandler?

Posted by Oleg Kalnichevski <ol...@apache.org>.

On Tue, 2008-02-12 at 18:11 -0500, Sam Berlin wrote: 
> > IOControl#requestOutput will eventually cause the
> > NHttpClientHandler#requestReady to fire, which in its turn will invoke
> > HttpRequestExecutionHandler#submitRequest.
> 
> Ok -- is it safe to assume that the IOControl can be retrieved always
> from the  context supplied to the handler (with a constant key)?  Or
> do the handlers have to be written with the reactor in mind?
> 

Hi Sam,

All non-blocking connection objects always implement the IOControl
interface. Methods of IOControl instances are expected to be threading
safe, unlike the connection instances themselves, so you can always
stick a reference to an IOControl into the context of another connection
in case you want to coordinate I/O operations on several connections.
And you always get a reference to the connection that caused an event in
the protocol handler. 

Hope this helps

Oleg

> The plan is to use this as the core for swarmed downloads.  That is, a
> single "swarm" would connect to multiple hosts and download different
> ranges of a single file.  Each connection would have to suspend
> reading if the writing is bogged down.  There would be multiple swarms
> going on at once, so there's a lot of activity happening under the
> hood.  Within each swarm, information exchanged during headers (or
> through other means) is propogated to other members of the swarm.
> (Our current code is all hacked up over the years, and it's time to
> retire it.)
> 
> > I think this line of thought is correct. However, several major pieces
> > of functionality are still missing, namely a connection manager and a
> > pipelining capable NHttpClientHandler, in order for the NIO based
> > components to be of any use on the client side.
> 
> The code we've used for nearly a decade doesn't have connection
> management or pipelining either, so no big deal there. :)
> 
> Sam
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> For additional commands, e-mail: dev-help@hc.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org

Re: Use cases of DefaultConnectingIOReactor & NHttpClientHandlerBase w/ HttpRequestExecutionHandler?

Posted by Sam Berlin <sb...@gmail.com>.

> IOControl#requestOutput will eventually cause the
> NHttpClientHandler#requestReady to fire, which in its turn will invoke
> HttpRequestExecutionHandler#submitRequest.

Ok -- is it safe to assume that the IOControl can be retrieved always
from the  context supplied to the handler (with a constant key)?  Or
do the handlers have to be written with the reactor in mind?

The plan is to use this as the core for swarmed downloads.  That is, a
single "swarm" would connect to multiple hosts and download different
ranges of a single file.  Each connection would have to suspend
reading if the writing is bogged down.  There would be multiple swarms
going on at once, so there's a lot of activity happening under the
hood.  Within each swarm, information exchanged during headers (or
through other means) is propogated to other members of the swarm.
(Our current code is all hacked up over the years, and it's time to
retire it.)

> I think this line of thought is correct. However, several major pieces
> of functionality are still missing, namely a connection manager and a
> pipelining capable NHttpClientHandler, in order for the NIO based
> components to be of any use on the client side.

The code we've used for nearly a decade doesn't have connection
management or pipelining either, so no big deal there. :)

Sam

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org