You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Alex Talis <al...@yahoo.com> on 2007/02/23 03:59:35 UTC

Best balance between performance and resource usage

Hi all. I apologize for a huge post and if you could answer any of my questions it would be a great help. I desperately need advice on the best way to use HttpClient to balance performance and resource utilization.

Here's how my application uses the library. I may have up to 100 Applet clients running at the same time on various hosts. These clients need to display information from different servers, but because applets can only connect back to the server from which they were loaded, they ask one central server to give them data from the remote server they're interested in. Applets use HttpClient to connect to CentralServer, and include the URI of the remote server from which they need to get data as part of the request. CentralServer uses a static HttpClient instance to pass the Applet's request on to the other servers. There may be up to 20 such remote servers. Each client can manage only one remote server at a time, so it'll connect only to that one server. Each client can, however, make multiple concurrent requests. So if 20 clients all decide to look at the same server and make 5 concurrent requests each, the CentralServer will get hit with 100 requests, all for the same remote
 server. Of course, the other clients will still keep asking for data from other servers.


I'm trying to figure out the best way of using HttpClient, MultiThreadedHttpConnectionManager, pool sizes, and HostConfigurations to make my CentralServer (the component that sits in the middle and distributes requests from clients to remote servers) as efficient as possible. I've been through tutorials and mailing list archives, but I still can't quite figure out all the relationships between these concepts.


In CentralServer, all http requests are made through the one static instance of HttpClient. CentralServer creates a new PostMethod for every request. Code from "CentralServer" servlet is below, and after that I ask specific questions.

---------------

    private static HttpClient httpClient;
    private static MultiThreadedHttpConnectionManager httpConnectionManager;

    /*** Initalize singleton HttpClient and connection manager when servlet class is loaded.
    ***/
    static {
        HttpConnectionManagerParams params = new HttpConnectionManagerParams();
        
        params.setConnectionTimeout(10000);
        
        /*** I'm using arbitrary value of 200 for the max number of connections,
         *** but this is one of the values I need to pin down
         ***/
        params.setDefaultMaxConnectionsPerHost(200);
        params.setMaxTotalConnections(200);

        httpConnectionManager = new MultiThreadedHttpConnectionManager();
        httpConnectionManager.setParams(params);        
        
        // Configure a thread to check for and close idle connections.
        IdleConnectionTimeoutThread idleConnectionTimeoutThread = new IdleConnectionTimeoutThread();
        idleConnectionTimeoutThread.addConnectionManager(httpConnectionManager);
        
        // Check for idle connections to close every 15 seconds.
        idleConnectionTimeoutThread.setTimeoutInterval(15000);
        
        // Close connections that have been idle for at least 30 seconds.
        idleConnectionTimeoutThread.setConnectionTimeout(30000);
        idleConnectionTimeoutThread.start();
        
        httpClient = new HttpClient(httpConnectionManager);
    }

    private void makeRequest(HttpServletResponse response, String serverPath) {
        PostMethod method = new PostMethod(serverPath);
        method.addRequestHeader(CONTENT_TYPE_HEADER_NAME, CONTENT_TYPE_HEADER_VALUE);
        method.addParameter("serverCommandValueObject", "someParam");

        Reader reader = null;
        Writer writer = null;
        try {
            httpClient.executeMethod(method);
            reader = new InputStreamReader(method.getResponseBodyAsStream(), "UTF-8");
            writer = new BufferedWriter(new OutputStreamWriter(response.getOutputStream(), "UTF-8"));
            
            /*** What's a good buffer size? Does it matter? ***/
            char[] buffer = new char[1024];
            int numRead;
            while ((numRead = reader.read(buffer)) > 0) {
                writer.write(buffer, 0, numRead);
            }
        }
        catch (Exception e) {
            ex.printStackTrace();
        }
        finally {
            ... Some code to safely close reader and writer ...
            method.releaseConnection();
        }
    }

---------------

Questions:

1. Since I have a finite set of remote URIs, does it buy me anything to create HostConfiguration objects for each server and use them when I make requests? Like this:

    Map uriToHostMap = new HashMap();
    uriToHostMap.put("http://chef:18081/RemoteServlet", chefHostConfiguration);
    uriToHostMap.put("http://cartman:18081/RemoteServlet", cartmanHostConfiguration);
    uriToHostMap.put("http://kenny:18081/RemoteServlet", kennyHostConfiguration);
    
I can build the map up as new requests are being made and then pass correct HostConfiguration to httpClient, like this

    PostMethod method = new PostMethod(serverPath);
    HostConfiguation host = (HostConfiguration)uriToHostMap.get(serverPath);
    httpClient.executeMethod(host, method);
    
Will this improve how the connection manager selects HttpConnection to reuse?


2. Since I have a finite number of servers, does it make sense to use per-host connection pool size? Or is it just as good to have one big pool of connections?


3. Is the following statement true: <Number of Hosts> * <MaxHostConnections> must be <= MaxTotalConnections?


4. Can I set MaxHostConnections to 100 and MaxTotalConnections to 200 and still connect to 20 hosts? Will the connections shift from connection pool for one host to connection pool of another as needed?


5. Is the purpose of MaxHostConnections simply to limit the number of connections that can be made to a given Host. If so, then I can probably have just one global connection pool, because I don't want to put any limit on the number of host connections - I just want to keep the CentralServer system from running out of sockets.

Also, I can't predict what clients will look at what servers. It may be pretty eveny distributed in one moment, but then 50 clients will decide to look at the same remote server. And clients CAN send multiple concurrent requests.


6. I wanted to monitor pool size, so I periodically print the value of httpConnectionManager.getConnectionsInPool(), but I noticed that it never shrinks, even though I'm running the IdleConnectionTimeoutThread. I figured out that to shrink it, I have to call httpConnectionManager.deleteClosedConnections().

  (a) Do calls to closeIdleConnections completely release all system resources used by HttpConnection objects it closes? Meaning sockets, of course.
  
  (b) Is there a way to determine how many of the connections in a pool are closed (for tracking purposes)? I mean, an existing way, other than extending some class in the library?
  
  (c) Is it a good idea to call httpConnectionManager.deleteClosedConnections() once in a while? If an HttpConnection is not deleted, will it hold any resources, other than the memory it occupies?

  (d) Are there advantages to keeping closed connections in the pool? Is it faster to open an existing closed connection then to create a new one and open it?
  
Thanks you very much

Re: Best balance between performance and resource usage

Posted by Roland Weber <ht...@dubioso.net>.
Hi Alex,

> Below are the settings I'm going to use. Please tell me if I'm wrong
> about any. It's worth mentioning that all servers run in dedicated Tomcats,
> and the HTTP connector is configured with maxThreads="1000", so supposedly
> 1000 concurrent requests can be handled. Server hosts are beefy
> multi-processor systems with at least 2Gb of memory, and Tomcat is given
> 512m. Network connections are usually 100Mbps.

I'm not a Tomcat expert. But 1000 service threads sounds like an extremely
high number, even if the JVM has more than 1 GB of memory. If the service
is lightweight, you may get away with it. Have you _tested_ those servers
for load? I would expect them to desperately run garbage collection when
more than a few hundred requests have to be served simultaneously. If they
have to access some backend like a database which limits it's own connections
to a few dozens, you'd also be wasting resources because most requests just
queue up waiting for a backend connection.
You can enable verbose GC in the server JVM to get performance data. If you
see the GC run every 5 seconds for 1 second, that means the JVM is spending
20% of it's performance for garbage collection. As a rule of thumb, 5% is
good, 10% should be the upper bound. If you can service 1000 connections
within those limits, you're fine.
For backend access (if any), the pool settings for datasources and such
stuff have to be reviewed. On each of the backend servers. If either one
of them runs into overload, it can grab all connections and slow down the
whole system. And remember: there is _no_ way of telling whether a server
can sustain load _except_ running a load test on it. I've seen a single
misplaced synchronized statement in application code slow down a system
with 4 multiprocessor servers and plenty of memory to a grinding halt.

> MaxHostConnections = 1000
> MaxTotalConnections = 1000 
> CloseIdleConnectionsPeriod = 1 minute
> IdleConnectionTimeout = 3 minutes 
> DeleteClosedConnectionsPeriod = 10 minutes

The timeouts seem reasonable. For connections, see above.

> I decided to occasionally delete closed connections, just to be on the safe
> side. I ran the system overnight without any incoming connections, and the
> pool stayed at the max size it reached. Netstat did show that there were no
> open sockets, but it looks like HttpConnections never got deleted. I'll
> test it a bit more, maybe I'm missing something.

I didn't ask which version you're using. There have been some fixes to
idle connection handling in 3.1, like HTTPCLIENT-597 [1]. If you're on
3.0, calling deleteClosedConnections is a good idea. HttpClient 3.1 RC1
is going to be released in a few weeks, upgrading to that version would
even be better.

> Clients can track only one site at a time by design. These are rich, thick
> clients, and have to display a great deal of information.

I got your original mail wrong on this point. I thought that the single
instance of HttpClient in your CentralServer would be able to connect
to a single backend server only. Thanks for clearing that up.

cheers,
  Roland

[1] https://issues.apache.org/jira/browse/HTTPCLIENT-597


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: Best balance between performance and resource usage

Posted by Alex Talis <al...@yahoo.com>.
Hi Roland

Thanks a lot for your quick response. You've clarified things.

Our main problem was that the CentralServer was getting BindExceptions and Clients were disconnecting. This was happening because the server was running out of sockets. I realized that we were doing it all wrong by creating individual HttpClients for each request and not using a single MTHCM. Below are the settings I'm going to use. Please tell me if I'm wrong about any. It's worth mentioning that all servers run in dedicated Tomcats, and the HTTP connector is configured with maxThreads="1000", so supposedly 1000 concurrent requests can be handled. Server hosts are beefy multi-processor systems with at least 2Gb of memory, and Tomcat is given 512m. Network connections are usually 100Mbps.

MaxHostConnections = 1000
MaxTotalConnections = 1000
CloseIdleConnectionsPeriod = 1 minute
IdleConnectionTimeout = 3 minutes
DeleteClosedConnectionsPeriod = 10 minutes

I decided to occasionally delete closed connections, just to be on the safe side. I ran the system overnight without any incoming connections, and the pool stayed at the max size it reached. Netstat did show that there were no open sockets, but it looks like HttpConnections never got deleted. I'll test it a bit more, maybe I'm missing something.



In case you're completely bummed about why we have this silly architecture and have absolutely nothing else to do on you weekend :)  I wanted to give a better description of what the system does. Feel free to ignore this!

All clients and servers are part of the same enterprise on the same intranet, and there are no limits imposed on the number of connections. The remote servers are not there to share load - they are actually at different geographical locations and allow users to track what's going on at that site. If a site in Sydney, Australia is in busy production time, most clients want to monitor that site and that's why most of the connections will go out to that site, while the other servers may be pretty idle. Clients can track only one site at a time by design. These are rich, thick clients, and have to display a great deal of information. It's simpler for the user to concentrate on one server, but they can look at any of the servers by selecting a different one from the list. Depending on the user's job function, they may want to track multiple servers, and in that case they can open more than one client on the same host and select a different server in each one.


Thank You

Alex

Roland Weber <ht...@dubioso.net> wrote: Hello Alex,

please apologize that I won't go into all the details.

> Here's how my application uses the library. I may have up to 100 Applet
> clients running at the same time on various hosts. These clients need to
> display information from different servers, but because applets can only
> connect back to the server from which they were loaded, they ask one
> central server to give them data from the remote server they're interested
> in.

Unsigned applets can only connect to the server they come from.
IIRC, signed applets can do more. Applet signing certificates are
not cheap to come by, but if it saves you from implementing a complex
proxy on the server and buying bigger server hardware for the additional
load, it might still be worthwhile.

> Applets use HttpClient to connect to CentralServer, and include the URI
> of the remote server from which they need to get data as part of the
> request. CentralServer uses a static HttpClient instance to pass the
> Applet's request on to the other servers. There may be up to 20 such remote
> servers. Each client can manage only one remote server at a time, so it'll
> connect only to that one server.

Why that? HttpClient has no such restriction. Is this a problem of
your environment?

> Each client can, however, make multiple
> concurrent requests. So if 20 clients all decide to look at the same server
> and make 5 concurrent requests each, the CentralServer will get hit with
> 100 requests, all for the same remote server. Of course, the other clients
> will still keep asking for data from other servers.

So instead of passing requests to as many servers as possible,
you push one server into overload and let the other 19 run idle?
Maybe I'm missing some piece of the puzzle, but this sounds like
a very inefficient way of managing the workload.

> I'm trying to figure out the best way of using HttpClient,
> MultiThreadedHttpConnectionManager, pool sizes, and HostConfigurations to
> make my CentralServer (the component that sits in the middle and
> distributes requests from clients to remote servers) as efficient as
> possible. I've been through tutorials and mailing list archives, but I
> still can't quite figure out all the relationships between these concepts.

I'll try to summarize the ideas. Forget about HostConfiguration,
you typically use it only to configure a proxy. The objects are
very lightweight, you can not save significant time there.

Take 1 HttpClient with 1 MultiThreadedHttpConnectionManager (MTHCM).
If there are cookies coming from the servers, you will have to create
and keep a separate HttpState for every client your CentralServer.
Or you use an empty HttpState for every request and throw it away
afterwards. Don't share HttpState between different client sessions.

MTHCM has two limits you can adjust. MaxTotalConnections limits the
number of outgoing connections in total. You choose that limit based
on:
- the number of sockets you want to have open
- the number of service threads in CentralServer
- available network bandwidth
- other resource limits on the machine running CentralServer

MaxConnectionsPerHost limits the number of outgoing connections to
a single server. You can only set a common limit for all servers.
HTTP specification requires that no user agent opens more than 2
simultaneous connections to a single host. Proxies, such as your
CentralServer, are allowed to open 2 simultaneous connections for
each client that tries to reach a server.
If you're in a closed environment, and all participants (above all
the operators of the servers you are connecting to) agree, you can
of course ignore such limits.
You choose the MaxConnectionsPerHost limits based on the capacity
of the servers you are connecting to. If you know that server X
has only 10 service threads, there is no point in sending 100
requests there at the same time. You'd allow 10, or maybe 20 to
avoid round trip latency, but no more. Clients should better be
blocked in CentralServer and leave the remaining (total) connections
available for requests to the other 19 servers.

> In CentralServer, all http requests are made through the one static
> instance of HttpClient. CentralServer creates a new PostMethod for every
> request. Code from "CentralServer" servlet is below, and after that I ask
> specific questions.

Sorry, I'm not in the mood for code reviews.

> Questions:
> 
> 1. Since I have a finite set of remote URIs, does it buy me anything to
> create HostConfiguration objects for each server and use them when I make
> requests?

No.

> 2. Since I have a finite number of servers, does it make sense to use
> per-host connection pool size? Or is it just as good to have one big pool
> of connections?

One HttpClient means one MTHCM means one pool.
One big pool is better than individual pools.

> 3. Is the following statement true:  *
>  must be <= MaxTotalConnections?

No. In your scenario, #hosts * MaxConnPerHost is the maximum number
of connections you could have open. Limits are there to _reduce_
that number, in order to avoid overload situations. It is better
to process some request within load limits and keep the others
waiting than to overload the machine. Do you have enough service
threads in the first place to open that many connections? If so,
you should reduce that number, with 100 requests and 20 servers at
the same time, you'll be overloading CentralServer almost surely.

> 4. Can I set MaxHostConnections to 100 and MaxTotalConnections to 200 and
> still connect to 20 hosts? Will the connections shift from connection pool
> for one host to connection pool of another as needed?

Yes, depending on the requests coming in. You can have 10 connections to
each of the 20 hosts. You can have 100 connections to one host and split
the remaining 100 between the other 19 host. But 100 connections per host
sounds like a very high number to me. There is no point in opening those
connections if the requests will be queued up at that host. If that is
the case, it would be better to queue them at CentralServer, so that the
fewer connections can be re-used for sending the other requests when the
server is ready to server them.
There is only one connection pool. Yes, connections will be reassigned
from one host to another in that pool.


> 5. Is the purpose of MaxHostConnections [...]

see above


> 6. I wanted to monitor pool size, so I periodically print the value of
> httpConnectionManager.getConnectionsInPool(), but I noticed that it never
> shrinks, even though I'm running the IdleConnectionTimeoutThread. I figured
> out that to shrink it, I have to call
> httpConnectionManager.deleteClosedConnections().

Check the timeout setting. Check the workload. You don't need to call
deleteClosedConnections(), it is called by closeIdleConnections().

> (a) Do calls to closeIdleConnections completely release all system
> resources used by HttpConnection objects it closes? Meaning sockets, of
> course.

No. The sockets get closed, that's as much as we can do. We have had
reports that sockets can still hang around on the operating system
level in a CLOSED_WAIT state. Don't know what that means, I'm not a
TCP/IP expert.

> (b) Is there a way to determine how many of the connections in a pool are
> closed (for tracking purposes)? I mean, an existing way, other than
> extending some class in the library?

None that I know of. We'll add monitoring in HttpConn 4 or 5,
sooner or later ;-)

> (c) Is it a good idea to call
> httpConnectionManager.deleteClosedConnections() once in a while? If an
> HttpConnection is not deleted, will it hold any resources, other than the
> memory it occupies?

See above. No.

> (d) Are there advantages to keeping closed connections in the pool? Is it
> faster to open an existing closed connection then to create a new one and
> open it?

No. They will be thrown away anyway.


hope that helps,
  Roland


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org



Re: Best balance between performance and resource usage

Posted by Roland Weber <ht...@dubioso.net>.
Hello Alex,

please apologize that I won't go into all the details.

> Here's how my application uses the library. I may have up to 100 Applet
> clients running at the same time on various hosts. These clients need to
> display information from different servers, but because applets can only
> connect back to the server from which they were loaded, they ask one
> central server to give them data from the remote server they're interested
> in.

Unsigned applets can only connect to the server they come from.
IIRC, signed applets can do more. Applet signing certificates are
not cheap to come by, but if it saves you from implementing a complex
proxy on the server and buying bigger server hardware for the additional
load, it might still be worthwhile.

> Applets use HttpClient to connect to CentralServer, and include the URI
> of the remote server from which they need to get data as part of the
> request. CentralServer uses a static HttpClient instance to pass the
> Applet's request on to the other servers. There may be up to 20 such remote
> servers. Each client can manage only one remote server at a time, so it'll
> connect only to that one server.

Why that? HttpClient has no such restriction. Is this a problem of
your environment?

> Each client can, however, make multiple
> concurrent requests. So if 20 clients all decide to look at the same server
> and make 5 concurrent requests each, the CentralServer will get hit with
> 100 requests, all for the same remote server. Of course, the other clients
> will still keep asking for data from other servers.

So instead of passing requests to as many servers as possible,
you push one server into overload and let the other 19 run idle?
Maybe I'm missing some piece of the puzzle, but this sounds like
a very inefficient way of managing the workload.

> I'm trying to figure out the best way of using HttpClient,
> MultiThreadedHttpConnectionManager, pool sizes, and HostConfigurations to
> make my CentralServer (the component that sits in the middle and
> distributes requests from clients to remote servers) as efficient as
> possible. I've been through tutorials and mailing list archives, but I
> still can't quite figure out all the relationships between these concepts.

I'll try to summarize the ideas. Forget about HostConfiguration,
you typically use it only to configure a proxy. The objects are
very lightweight, you can not save significant time there.

Take 1 HttpClient with 1 MultiThreadedHttpConnectionManager (MTHCM).
If there are cookies coming from the servers, you will have to create
and keep a separate HttpState for every client your CentralServer.
Or you use an empty HttpState for every request and throw it away
afterwards. Don't share HttpState between different client sessions.

MTHCM has two limits you can adjust. MaxTotalConnections limits the
number of outgoing connections in total. You choose that limit based
on:
- the number of sockets you want to have open
- the number of service threads in CentralServer
- available network bandwidth
- other resource limits on the machine running CentralServer

MaxConnectionsPerHost limits the number of outgoing connections to
a single server. You can only set a common limit for all servers.
HTTP specification requires that no user agent opens more than 2
simultaneous connections to a single host. Proxies, such as your
CentralServer, are allowed to open 2 simultaneous connections for
each client that tries to reach a server.
If you're in a closed environment, and all participants (above all
the operators of the servers you are connecting to) agree, you can
of course ignore such limits.
You choose the MaxConnectionsPerHost limits based on the capacity
of the servers you are connecting to. If you know that server X
has only 10 service threads, there is no point in sending 100
requests there at the same time. You'd allow 10, or maybe 20 to
avoid round trip latency, but no more. Clients should better be
blocked in CentralServer and leave the remaining (total) connections
available for requests to the other 19 servers.

> In CentralServer, all http requests are made through the one static
> instance of HttpClient. CentralServer creates a new PostMethod for every
> request. Code from "CentralServer" servlet is below, and after that I ask
> specific questions.

Sorry, I'm not in the mood for code reviews.

> Questions:
> 
> 1. Since I have a finite set of remote URIs, does it buy me anything to
> create HostConfiguration objects for each server and use them when I make
> requests?

No.

> 2. Since I have a finite number of servers, does it make sense to use
> per-host connection pool size? Or is it just as good to have one big pool
> of connections?

One HttpClient means one MTHCM means one pool.
One big pool is better than individual pools.

> 3. Is the following statement true: <Number of Hosts> *
> <MaxHostConnections> must be <= MaxTotalConnections?

No. In your scenario, #hosts * MaxConnPerHost is the maximum number
of connections you could have open. Limits are there to _reduce_
that number, in order to avoid overload situations. It is better
to process some request within load limits and keep the others
waiting than to overload the machine. Do you have enough service
threads in the first place to open that many connections? If so,
you should reduce that number, with 100 requests and 20 servers at
the same time, you'll be overloading CentralServer almost surely.

> 4. Can I set MaxHostConnections to 100 and MaxTotalConnections to 200 and
> still connect to 20 hosts? Will the connections shift from connection pool
> for one host to connection pool of another as needed?

Yes, depending on the requests coming in. You can have 10 connections to
each of the 20 hosts. You can have 100 connections to one host and split
the remaining 100 between the other 19 host. But 100 connections per host
sounds like a very high number to me. There is no point in opening those
connections if the requests will be queued up at that host. If that is
the case, it would be better to queue them at CentralServer, so that the
fewer connections can be re-used for sending the other requests when the
server is ready to server them.
There is only one connection pool. Yes, connections will be reassigned
from one host to another in that pool.


> 5. Is the purpose of MaxHostConnections [...]

see above


> 6. I wanted to monitor pool size, so I periodically print the value of
> httpConnectionManager.getConnectionsInPool(), but I noticed that it never
> shrinks, even though I'm running the IdleConnectionTimeoutThread. I figured
> out that to shrink it, I have to call
> httpConnectionManager.deleteClosedConnections().

Check the timeout setting. Check the workload. You don't need to call
deleteClosedConnections(), it is called by closeIdleConnections().

> (a) Do calls to closeIdleConnections completely release all system
> resources used by HttpConnection objects it closes? Meaning sockets, of
> course.

No. The sockets get closed, that's as much as we can do. We have had
reports that sockets can still hang around on the operating system
level in a CLOSED_WAIT state. Don't know what that means, I'm not a
TCP/IP expert.

> (b) Is there a way to determine how many of the connections in a pool are
> closed (for tracking purposes)? I mean, an existing way, other than
> extending some class in the library?

None that I know of. We'll add monitoring in HttpConn 4 or 5,
sooner or later ;-)

> (c) Is it a good idea to call
> httpConnectionManager.deleteClosedConnections() once in a while? If an
> HttpConnection is not deleted, will it hold any resources, other than the
> memory it occupies?

See above. No.

> (d) Are there advantages to keeping closed connections in the pool? Is it
> faster to open an existing closed connection then to create a new one and
> open it?

No. They will be thrown away anyway.


hope that helps,
  Roland


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org