You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by "Carl A. Dunham" <ht...@strategyforward.com> on 2003/03/20 20:43:39 UTC

CRLF and Connection: close

First, let me say that finding HttpClient was this week's bacon-saver. The 
other alternatives out there are, shall we say politely, a bit lacking.

I have, however, run into a couple of things. Apologies if these have been 
covered before. I tried searching the archives, and found some close hits, but 
none dead on.

The application I'm working on is a simple page downloader, not really a 
spider, just downloads pages from a list and parses the text out of the html. 
So, it ends up having to deal with a bunch of less-than-ideal page constructs, 
and generally will not visit the same site more than once. It also will visit 
thousands of sites in a session.

The first thing has to do with the specs for Internet text messages (RFC 822), 
which HTTP messages are. Apparently this is not really followed, at least in 
terms of line termination. It is pretty clear that every line is supposed to 
end with CRLF (\r\n), yet even a brief look at real-world messages will show 
you that this is routinely ignored. In fact, it seems that a majority of 
messages are terminated only with LF (\n).

Where this comes into play is in HttpParser.readLine() and readRawLine(), 
which assume standards-compliant messages. Fortunately, most of the time the 
non-compliant messages work, because they are transmitted as single lines, so 
inputStream.read() returns -1 at the end, and everything is fine. However, 
occasionally this will not work, for example if the server-side code has 
something like:

setHeader("Something: value\nSomething-else: value\n\n");

depending on the implementation, unexpected things can happen. Here is an 
example from a log:

2003/03/20 00:24:58:765 EST [TRACE] HttpParser - -enter HttpConnection.readLine()
2003/03/20 00:24:58:765 EST [TRACE] HttpParser - -enter HttpConnection.readRawLine()
2003/03/20 00:24:58:769 EST [DEBUG] wire - -<< "Content-type: text/html
Page-Completion-Status: Normal

 
" [\r\n]
2003/03/20 00:24:58:769 EST [TRACE] HttpParser - -enter HttpConnection.readLine()
2003/03/20 00:24:58:769 EST [TRACE] HttpParser - -enter HttpConnection.readRawLine()
2003/03/20 00:24:58:770 EST [DEBUG] wire - -<< "<html> 
" [\r\n]
2003/03/20 00:24:58:772 EST [WARN] HttpMethod - -Recoverable exception caught when reading response
2003/03/20 00:24:58:773 EST [DEBUG] HttpMethod - -Closing the connection.

The recoverable exception was an invalid header line "<html>".

I can provide example URLs, if this will help.

The second thing has to do with how Keep-alive connections behave. This is a 
multi-threaded app, using MultiThreadedHttpConnectionManager. It works great, 
however I don't get much benefit of the shared Connections, because I'm not 
connecting to the same site more than once, generally. That's OK, the problem 
I run into is that after running for not very long, I suddenly start getting 
everything timing out. It's hard to really pinpoint the timing, giving all the 
activity, and no thread identifiers in the log messages, but I think what is 
happening is that the system is simply running out of file handles or 
system-level connections. A quick "netstat -n" shows a whole bunch of open, 
TIME_WAIT, and other connections. It seems that the Connection Manager is 
keeping them around for re-use, and following HTTP/1.1. One fix was to send 
"Connection: close" as a RequestHeader, which really fixed things up, but now 
I am running into sites that are not responding, and not timing out. The log 
traces into ReadRawLine() and just sits there. I am still tracking this down, 
I just wonder if anyone else has seen this also?

Well, sorry to be so long-winded, and thanks!

Carl

Re: CRLF and Connection: close

Posted by "Carl A. Dunham" <ca...@carldunham.com>.
BTW, here is a quick patch to "degrade" the code to work with these types of 
messages:


*** HttpParser.java~    Tue Feb 25 10:22:00 2003
--- HttpParser.java     Thu Mar 20 02:06:02 2003
***************
*** 53,61 ****
                      break;
                  }
                  buf.write(ch);
!                 if (ch == '\n') {
!                     break;
!                 }
              }
          }
          if (WIRE_LOG.isDebugEnabled()) {
--- 53,61 ----
                      break;
                  }
                  buf.write(ch);
!             }
!             if (ch == '\n') {
!                 break;
              }
          }
          if (WIRE_LOG.isDebugEnabled()) {
***************
*** 88,93 ****
--- 88,95 ----
          int len = rawdata.length;
          if (( len >= 2) && (rawdata[len - 2] == '\r') && (rawdata[len - 1] == '\n')) {
              return HttpConstants.getString(rawdata, 0, rawdata.length - 2);
+         } else if (( len >= 1) && (rawdata[len - 1] == '\n')) {
+             return HttpConstants.getString(rawdata, 0, rawdata.length - 1);
          } else {
              return HttpConstants.getString(rawdata);


Note that the wire log message still needs to be changed, if this is an appropriate 
fix.

Thanks!


Re: CRLF and Connection: close

Posted by "Carl A. Dunham" <ht...@strategyforward.com>.
BTW, here is a quick patch to "degrade" the code to work with these types of 
messages:


*** HttpParser.java~    Tue Feb 25 10:22:00 2003
--- HttpParser.java     Thu Mar 20 02:06:02 2003
***************
*** 53,61 ****
                      break;
                  }
                  buf.write(ch);
!                 if (ch == '\n') {
!                     break;
!                 }
              }
          }
          if (WIRE_LOG.isDebugEnabled()) {
--- 53,61 ----
                      break;
                  }
                  buf.write(ch);
!             }
!             if (ch == '\n') {
!                 break;
              }
          }
          if (WIRE_LOG.isDebugEnabled()) {
***************
*** 88,93 ****
--- 88,95 ----
          int len = rawdata.length;
          if (( len >= 2) && (rawdata[len - 2] == '\r') && (rawdata[len - 1] 
== '\n')) {
              return HttpConstants.getString(rawdata, 0, rawdata.length - 2);
+         } else if (( len >= 1) && (rawdata[len - 1] == '\n')) {
+             return HttpConstants.getString(rawdata, 0, rawdata.length - 1);
          } else {
              return HttpConstants.getString(rawdata);


Note that the wire log message still needs to be changed, if this is an 
appropriate 
fix.

Thanks!


Re: CRLF and Connection: close

Posted by "Carl A. Dunham" <ht...@strategyforward.com>.
On Thursday March 20 2003 17:43, Oleg Kalnichevski wrote:
> Carl
>
> First of all, many thanks for bringing these issues up

No problem. It's nice to see such positive and quick reponses!

>
> > I can provide example URLs, if this will help.
>
> Please do so. It will help me testing

http://www.screensaver.com/index.cfm


> Cheers
>
> Oleg
>

Thanks!

Carl


Re: CRLF and Connection: close

Posted by Oleg Kalnichevski <o....@dplanet.ch>.
Carl

First of all, many thanks for bringing these issues up

> The first thing has to do with the specs for Internet text messages (RFC 822), 
> which HTTP messages are. Apparently this is not really followed, at least in 
> terms of line termination. It is pretty clear that every line is supposed to 
> end with CRLF (\r\n), yet even a brief look at real-world messages will show 
> you that this is routinely ignored. In fact, it seems that a majority of 
> messages are terminated only with LF (\n).
> 

Clearly, this problem needs to be addressed and the code in
HttpConnection#readRawLine & HttpConnection#readLine needs to be made
more robust when dealing with RFC822 non-compliant HTTP servers. I'll
work on a fix tomorrow (it's getting kind of late here)

<...>


> I can provide example URLs, if this will help.
> 

Please do so. It will help me testing 


> The second thing has to do with how Keep-alive connections behave. This is a 
> multi-threaded app, using MultiThreadedHttpConnectionManager. It works great, 
> however I don't get much benefit of the shared Connections, because I'm not 
> connecting to the same site more than once, generally. That's OK, the problem 
> I run into is that after running for not very long, I suddenly start getting 
> everything timing out. It's hard to really pinpoint the timing, giving all the 
> activity, and no thread identifiers in the log messages, but I think what is 
> happening is that the system is simply running out of file handles or 
> system-level connections. A quick "netstat -n" shows a whole bunch of open, 
> TIME_WAIT, and other connections. It seems that the Connection Manager is 
> keeping them around for re-use, and following HTTP/1.1. One fix was to send 
> "Connection: close" as a RequestHeader, which really fixed things up, but now 
> I am running into sites that are not responding, and not timing out. The log 
> traces into ReadRawLine() and just sits there. I am still tracking this down, 
> I just wonder if anyone else has seen this also?
> 

The definitive authority on this issue would be Mike Becke. Let's wait
for his comment on it

Cheers

Oleg




Re: CRLF and Connection: close

Posted by "Carl A. Dunham" <ht...@strategyforward.com>.
Simon,

On Thursday March 20 2003 17:27, Simon Roberts wrote:
> Are you calling HttpMethod.releaseConnection once you've finished with
> each HttpMethod?  If you don't do that, HttpClient will hold the
> connections until a garbage-collect reaps the connections.

Yes, but as I read it, this only returns it to the connectionPool, leaving the 
connection physically still open. The actual system connection does not seem 
be freed until the next gc, presumably (and I seem to recall that being 
unreliable in Java in the past, although that might be ancient history). 
Unless I'm just missing it.

> That doesn't sound like the same problem. Which version are you running?
> There
> have been some recent changes with the way timeouts are dealt with.

This is ALPHA3. I will try the latest CVS version and see if it behaves the 
same way.

>
> Cheers, Simon
>

Thanks!

Carl


Re: CRLF and Connection: close

Posted by Simon Roberts <si...@fifthweb.net>.
Carl,

>         That's OK, the problem
> I run into is that after running for not very long, I suddenly start
getting
> everything timing out. It's hard to really pinpoint the timing, giving all
the
> activity, and no thread identifiers in the log messages, but I think what
is
> happening is that the system is simply running out of file handles or
> system-level connections.

Are you calling HttpMethod.releaseConnection once you've finished with
each HttpMethod?  If you don't do that, HttpClient will hold the connections
until a garbage-collect reaps the connections.

> A quick "netstat -n" shows a whole bunch of open,
> TIME_WAIT, and other connections. It seems that the Connection Manager is
> keeping them around for re-use, and following HTTP/1.1. One fix was to
send
> "Connection: close" as a RequestHeader, which really fixed things up, but
now
> I am running into sites that are not responding, and not timing out. The
log
> traces into ReadRawLine() and just sits there. I am still tracking this
down,
> I just wonder if anyone else has seen this also?

That doesn't sound like the same problem. Which version are you running?
There
have been some recent changes with the way timeouts are dealt with.

Cheers, Simon


Re: CRLF and Connection: close

Posted by Michael Becke <be...@u.washington.edu>.
Hi Carl,

Thanks for the patch.

 From looking at your chanegs and going over options in my head it may 
be difficult to handle these two cases at the same time.  The two cases 
being 1) max connections per host and 2) max total connections.  I am 
going to look at your patch some more.  At the moment I am leaning more 
towards having different connection pools.  It seems to make more sense, 
particularly for your case when there is no need to ever reuse a connection.

For future development, post 2.0, me might want to take a look at the 
commons Pool package.  It does a lot of this work, except that it does 
not seem to handle releasing objects that get garbage collected.

I will try to post a patch tonight.

Mike

Carl A. Dunham wrote:
> Mike,
> 
> I did a fairly rough version of your first suggestion. It basically adds a 
> list of free connections across hosts, and keeps a seperate counter. It seems 
> to be working pretty well, although it should probably use a smarter 
> algorithm for determining which connection to close, like the one least often 
> used, and with no one waiting to get it, etc.
> 
> This *might* be useable for both types of uses, although it would suffer a 
> slight performance hit for users who didn't care about the overall number of 
> connections, because now so much is synchronizing on the global 
> ConnectionPool.
> 
> Anyway, thanks for the help, everyone. Both of my original problems seem to be  
> well on their way to being non-problems!
> 
> Next stop, cookies...
> 
> 
> On Thursday March 20 2003 23:33, Michael Becke wrote:
> 
> 
>>The problem, as other have been hinting at, is that connections never
>>get destroyed.  By default, a max of two connections are created per
>>HostConfiguration.  If you are never connecting to the same host more
>>than once this is a bit of a problem.  The
>>MultiThreadedHttpConnectionManager was designed to reuse connections
>>for a small number of hosts.  So as you have guessed the connection
>>manager will continue to create connections to new hosts and will never
>>reclaim the old ones.  We have a couple of options for fixing this,
>>here are a few:
>>
>>- create a cap on the max number of connections that can be created.
>>once reached unused connections will be reclaimed
>>- implement some kind of connection cache/timeout.  only connections
>>that have been used in the last X milliseconds will be kept
>>- implement a new type of connection manager that better fits this kind
>>of process.  in particular it would focus less on connections per host,
>>but more on total consecutive connections.  in general we could
>>introduce a whole new set of connection managers they are optimized for
>>different use scenarios
>>
>>These are just a few ideas.  What do others think?
>>
>>Mike
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail:
>>commons-httpclient-dev-unsubscribe@jakarta.apache.org For additional
>>commands, e-mail: commons-httpclient-dev-help@jakarta.apache.org
>>
>>
>>------------------------------------------------------------------------
>>
>>Index: MultiThreadedHttpConnectionManager.java
>>===================================================================
>>RCS file: /home/cvspublic/jakarta-commons/httpclient/src/java/org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.java,v
>>retrieving revision 1.12
>>diff -u -r1.12 MultiThreadedHttpConnectionManager.java
>>--- MultiThreadedHttpConnectionManager.java	6 Mar 2003 07:49:03 -0000	1.12
>>+++ MultiThreadedHttpConnectionManager.java	21 Mar 2003 06:50:58 -0000
>>@@ -95,14 +95,15 @@
>>     private static final Log LOG = LogFactory.getLog(MultiThreadedHttpConnectionManager.class);
>> 
>>     // ----------------------------------------------------- Instance Variables
>>-    /**
>>-     * Map where keys are {@link HostConfiguration}s and values are {@link
>>-     * HostConnectionPool}s
>>-     */
>>-    private final Map mapHosts = new HashMap();
>> 
>>-    /** Maximum number of connections allowed */
>>-    private int maxConnections = 2;   // Per RFC 2616 sec 8.1.4
>>+    /** Maximum number of connections allowed per host */
>>+    private int maxHostConnections = 2;   // Per RFC 2616 sec 8.1.4
>>+
>>+    /** Maximum number of connections allowed overall */
>>+    private int maxTotalConnections = 20;
>>+
>>+    /** Connection Pool */
>>+    private ConnectionPool connectionPool;
>> 
>>     /** mapping from reference to hostConfiguration */
>>     private Map referenceToHostConfig;
>>@@ -119,6 +120,8 @@
>>     public MultiThreadedHttpConnectionManager() {
>> 
>>         this.referenceToHostConfig = Collections.synchronizedMap(new HashMap());
>>+        this.connectionPool = new ConnectionPool();
>>+        
>>         this.referenceQueue = new ReferenceQueue();
>> 
>>         new ReferenceQueueThread().start();
>>@@ -129,11 +132,11 @@
>>      * Sets the maximum number of connections allowed for a given
>>      * HostConfiguration. Per RFC 2616 section 8.1.4, this value defaults to 2.
>>      *
>>-     * @param maxConnections the number of connections allowed for each
>>+     * @param maxHostConnections the number of connections allowed for each
>>      * hostConfiguration
>>      */
>>-    public void setMaxConnectionsPerHost(int maxConnections) {
>>-        this.maxConnections = maxConnections;
>>+    public void setMaxConnectionsPerHost(int maxHostConnections) {
>>+        this.maxHostConnections = maxHostConnections;
>>     }
>> 
>>     /**
>>@@ -144,7 +147,25 @@
>>      * hostConfiguration.
>>      */
>>     public int getMaxConnectionsPerHost() {
>>-        return maxConnections;
>>+        return maxHostConnections;
>>+    }
>>+
>>+    /**
>>+     * Sets the maximum number of connections allowed in the system.
>>+     *
>>+     * @param maxHostConnections the number of connections allowed
>>+     */
>>+    public void setMaxTotalConnections(int maxTotalConnections) {
>>+        this.maxTotalConnections = maxTotalConnections;
>>+    }
>>+
>>+    /**
>>+     * Gets the maximum number of connections allowed in the system.
>>+     *
>>+     * @return The maximum number of connections allowed
>>+     */
>>+    public int getMaxTotalConnections() {
>>+        return maxTotalConnections;
>>     }
>> 
>>     /**
>>@@ -182,12 +203,7 @@
>>                 + hostConfiguration + ", timeout = " + timeout);
>>         }
>> 
>>-        // we get the connection pool with a clone of the hostConfiguration
>>-        // so that it cannot be changed once the connecton has been retrieved
>>-        final HttpConnection conn 
>>-            = getConnection(getConnectionPool(new HostConfiguration(hostConfiguration)),
>>-            hostConfiguration, timeout
>>-        );
>>+        final HttpConnection conn = doGetConnection(hostConfiguration, timeout);
>> 
>>         // wrap the connection in an adapter so we can ensure it is used 
>>         // only once
>>@@ -197,9 +213,9 @@
>>     /**
>>      * Gets a connection or waits if one is not available.  A connection is
>>      * available if one exists that is not being used or if fewer than
>>-     * maxConnections have been created in the connectionPool.
>>+     * maxHostConnections have been created in the connectionPool, and fewer
>>+     * than maxTotalConnections have been created in all connectionPools.
>>      *
>>-     * @param connectionPool The connection pool to use.
>>      * @param hostConfiguration The host configuration.
>>      * @param timeout the number of milliseconds to wait for a connection, 0 to
>>      * wait indefinitely
>>@@ -209,84 +225,71 @@
>>      * @throws HttpException if a connection does not become available in
>>      * 'timeout' milliseconds
>>      */
>>-    private HttpConnection getConnection(HostConnectionPool connectionPool,
>>-        HostConfiguration hostConfiguration, long timeout) throws HttpException {
>>+    private HttpConnection doGetConnection(HostConfiguration hostConfiguration, 
>>+        long timeout) throws HttpException {
>> 
>>         HttpConnection connection = null;
>> 
>>         synchronized (connectionPool) {
>> 
>>-            // keep trying until a connection is available, should happen at
>>-            // most twice
>>+            TimeoutThread threadTimeout = new TimeoutThread();
>>+            threadTimeout.setTimeout(timeout);
>>+            threadTimeout.setWakeupThread(Thread.currentThread());
>>+            threadTimeout.start();
>>+                    
>>+            // we clone the hostConfiguration
>>+            // so that it cannot be changed once the connection has been retrieved
>>+
>>+            hostConfiguration = new HostConfiguration(hostConfiguration);
>>+            HostConnectionPool hostPool = connectionPool.getHostPool(hostConfiguration);
>>+
>>             while (connection == null) {
>> 
>>-                if (connectionPool.freeConnections.size() > 0) {
>>-                    connection = (HttpConnection) connectionPool
>>-                        .freeConnections.removeFirst();
>>+                // happen to have a free connection with the right specs
>>+                //
>>+                if (hostPool.freeConnections.size() > 0) {
>>+                    connection = connectionPool.getFreeConnection(hostConfiguration);
>>+
>>+                // have room to make more
>>+                //
>>+                } else if ((hostPool.numConnections < maxHostConnections) &&
>>+                           (connectionPool.numConnections < maxTotalConnections)) {
>>+
>>+                    connection = connectionPool.createConnection(hostConfiguration);
>>+
>>+                // have room to add host connection, and there is at least one free
>>+                // connection that can be liberated to make overall room
>>+                //
>>+                } else if ((hostPool.numConnections < maxHostConnections) &&
>>+                           (connectionPool.freeConnections.size() > 0)) {
>>+
>>+                    connectionPool.deleteLeastUsedConnection();
>>+                    connection = connectionPool.createConnection(hostConfiguration);
>>+
>>+                // otherwise, we have to wait for one of the above conditions to
>>+                // become true
>>+                //
>>                 } else {
>>-                    // get number of connections hostConfig
>>-                    if (connectionPool.numConnections < maxConnections) {
>>-                        // Create a new connection
>>-                        connection = new HttpConnection(hostConfiguration);
>>-                        connection.setHttpConnectionManager(this);
>>-                        connectionPool.numConnections++;
>>-
>>-                        // add a weak reference to this connection
>>-                        referenceToHostConfig.put(new WeakReference(connection, referenceQueue),
>>-                            hostConfiguration);
>>-
>>-                    } else {
>>-
>>-                        TimeoutThread threadTimeout = new TimeoutThread();
>>-                        threadTimeout.setTimeout(timeout);
>>-                        threadTimeout.setWakeupThread(Thread.currentThread());
>>-                        threadTimeout.start();
>>-
>>-                        try {
>>-                            LOG.debug(
>>-                                "HttpConnectionManager.getConnection:  waiting for "
>>-                                + "connection from " + connectionPool
>>-                            );
>>-                            connectionPool.wait();
>>-                            // we were woken up before the timeout occurred, so
>>-                            // there should be a connection available
>>-                            threadTimeout.interrupt();
>>-                        } catch (InterruptedException e) {
>>-                            throw new HttpException("Timeout waiting for connection.");
>>-                        }
>>+                    // todo: keep track of which hostConfigurations have waiting
>>+                    // threads, so they avoid being sacrificed before necessary
>> 
>>+                    try {
>>+                        LOG.debug(
>>+                            "HttpConnectionManager.getConnection:  waiting for "
>>+                            + "connection from " + connectionPool
>>+                            );
>>+                        connectionPool.wait();
>>+                    } catch (InterruptedException e) {
>>+                        throw new HttpException("Timeout waiting for connection.");
>>                     }
>>                 }
>>             }
>>-
>>+            threadTimeout.interrupt();
>>         }
>>-
>>         return connection;
>>     }
>> 
>>     /**
>>-     * Get the pool (list) of connections available for the given hostConfig.
>>-     *
>>-     * @param hostConfiguration the configuraton for the connection pool
>>-     * @return a pool (list) of connections available for the given config
>>-     */
>>-    private HostConnectionPool getConnectionPool(HostConfiguration hostConfiguration) {
>>-        LOG.trace("enter HttpConnectionManager.getConnections(String)");
>>-
>>-        // Look for a list of connections for the given config
>>-        HostConnectionPool listConnections = null;
>>-        synchronized (mapHosts) {
>>-            listConnections = (HostConnectionPool) mapHosts.get(hostConfiguration);
>>-            if (listConnections == null) {
>>-                // First time for this config
>>-                listConnections = new HostConnectionPool();
>>-                mapHosts.put(hostConfiguration, listConnections);
>>-            }
>>-        }
>>-        return listConnections;
>>-    }
>>-
>>-    /**
>>      * Get the number of connections in use for this configuration.
>>      *
>>      * @param hostConfiguration the key that connections are tracked on
>>@@ -295,9 +298,9 @@
>>     public int getConnectionsInUse(HostConfiguration hostConfiguration) {
>>         LOG.trace("enter HttpConnectionManager.getConnectionsInUse(String)");
>> 
>>-        HostConnectionPool connectionPool = getConnectionPool(hostConfiguration);
>>         synchronized (connectionPool) {
>>-            return connectionPool.numConnections;
>>+            HostConnectionPool hostPool = connectionPool.getHostPool(hostConfiguration);
>>+            return hostPool.numConnections;
>>         }
>> 
>>     }
>>@@ -323,29 +326,166 @@
>>         // make sure that the response has been read.
>>         SimpleHttpConnectionManager.finishLastResponse(conn);
>> 
>>+        connectionPool.freeConnection(conn);
>>+    }
>>+
>>+
>>+    private HostConfiguration configurationForConnection(HttpConnection conn) {
>>+
>>         HostConfiguration connectionConfiguration = new HostConfiguration();
>>         connectionConfiguration.setHost(conn.getHost(), 
>>-            conn.getPort(), conn.getProtocol());
>>+                                        conn.getPort(), conn.getProtocol());
>>         if (conn.getProxyHost() != null) {
>>             connectionConfiguration.setProxy(conn.getProxyHost(), conn.getProxyPort());
>>         }
>> 
>>-        if (LOG.isDebugEnabled()) {
>>-            LOG.debug("HttpConnectionManager.releaseConnection:  Release connection for " 
>>-                + connectionConfiguration);
>>+        return connectionConfiguration;
>>+    }
>>+    
>>+
>>+    /**
>>+     * Global Connection Pool, including per-host pools
>>+     */
>>+    private class ConnectionPool {
>>+        /** The list of free connections */
>>+        private LinkedList freeConnections = new LinkedList();
>>+
>>+        /**
>>+         * Map where keys are {@link HostConfiguration}s and values are {@link
>>+         * HostConnectionPool}s
>>+         */
>>+        private final Map mapHosts = new HashMap();
>>+
>>+        /** The number of created connections */
>>+        private int numConnections = 0;
>>+
>>+
>>+        /**
>>+         * Create a new connection
>>+         *
>>+         * @param hostConfiguration the configuration for the connection
>>+         * @return a new connection
>>+         */
>>+        private synchronized HttpConnection createConnection(HostConfiguration hostConfiguration) {
>>+            HttpConnection connection = null;
>>+
>>+            HostConnectionPool hostPool = getHostPool(hostConfiguration);
>>+
>>+            if ((hostPool.numConnections < maxHostConnections) &&
>>+                (numConnections < maxTotalConnections)) {
>>+
>>+                connection = new HttpConnection(hostConfiguration);
>>+                connection.setHttpConnectionManager(MultiThreadedHttpConnectionManager.this);
>>+                connectionPool.numConnections++;
>>+                hostPool.numConnections++;
>>+        
>>+                // add a weak reference to this connection
>>+                referenceToHostConfig.put(new WeakReference(connection, referenceQueue),
>>+                                          hostConfiguration);
>>+            }
>>+            return connection;
>>+        }
>>+    
>>+
>>+        /**
>>+         * Get the pool (list) of connections available for the given hostConfig.
>>+         *
>>+         * @param hostConfiguration the configuraton for the connection pool
>>+         * @return a pool (list) of connections available for the given config
>>+         */
>>+        public synchronized HostConnectionPool getHostPool(HostConfiguration hostConfiguration) {
>>+            LOG.trace("enter HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)");
>>+
>>+            // Look for a list of connections for the given config
>>+            HostConnectionPool listConnections = null;
>>+            synchronized (mapHosts) {
>>+                listConnections = (HostConnectionPool) mapHosts.get(hostConfiguration);
>>+                if (listConnections == null) {
>>+                    // First time for this config
>>+                    listConnections = new HostConnectionPool();
>>+                    mapHosts.put(hostConfiguration, listConnections);
>>+                }
>>+            }
>>+            return listConnections;
>>+        }
>>+
>>+        /**
>>+         * If available, get a free connection for this host
>>+         *
>>+         * @param hostConfiguration the configuraton for the connection pool
>>+         * @return an available connection for the given config
>>+         */
>>+        public synchronized HttpConnection getFreeConnection(HostConfiguration hostConfiguration) {
>>+
>>+            HttpConnection connection = null;
>>+            
>>+            HostConnectionPool hostPool = getHostPool(hostConfiguration);
>>+
>>+            if (hostPool.freeConnections.size() > 0) {
>>+                connection = (HttpConnection) hostPool.freeConnections.removeFirst();
>>+                freeConnections.remove(connection);
>>+            }
>>+            return connection;
>>         }
>> 
>>-        final HostConnectionPool listConnections = getConnectionPool(connectionConfiguration);
>>-        synchronized (listConnections) {
>>-            // Put the connect back in the available list and notify a waiter
>>-            listConnections.freeConnections.addFirst(conn);
>>-            if (listConnections.numConnections == 0) {
>>-                // for some reason this connection pool didn't already exist
>>-                LOG.error("connection pool not found for: " 
>>-                    + connectionConfiguration);
>>-                listConnections.numConnections = 1;
>>+        /**
>>+         * Close and delete an old, unused connection to make room for a new one.
>>+         */
>>+        public synchronized void deleteLeastUsedConnection() {
>>+
>>+            // todo: don't just take the oldest one
>>+            //
>>+            HttpConnection connection = (HttpConnection) freeConnections.removeFirst();
>>+
>>+            if (connection != null) {
>>+                HostConfiguration connectionConfiguration = configurationForConnection(connection);
>>+
>>+                if (LOG.isDebugEnabled()) {
>>+                    LOG.debug("HttpConnectionManager.ConnectionPool.deleteLeastUsedConnection:  Closing least used connection for " 
>>+                              + connectionConfiguration);
>>+                }
>>+
>>+                connection.close();
>>+                
>>+                HostConnectionPool hostPool = getHostPool(connectionConfiguration);
>>+
>>+                hostPool.freeConnections.remove(connection);
>>+                hostPool.numConnections--;
>>+                numConnections--;
>>             }
>>-            listConnections.notify();
>>+        }
>>+
>>+        public void freeConnection(HttpConnection conn) {
>>+
>>+            HostConfiguration connectionConfiguration = configurationForConnection(conn);
>>+
>>+            if (LOG.isDebugEnabled()) {
>>+                LOG.debug("HttpConnectionManager.ConnectionPool.freeConnection:  Freeing connection for " 
>>+                          + connectionConfiguration);
>>+            }
>>+
>>+            synchronized (this) {
>>+                final HostConnectionPool hostPool = getHostPool(connectionConfiguration);
>>+
>>+                // Put the connect back in the available list and notify a waiter
>>+                hostPool.freeConnections.add(conn);
>>+                if (hostPool.numConnections == 0) {
>>+                    // for some reason this connection pool didn't already exist
>>+                    LOG.error("host connection pool not found for: " 
>>+                              + connectionConfiguration);
>>+                    hostPool.numConnections = 1;
>>+                }
>>+
>>+                freeConnections.add(conn);
>>+                if (numConnections == 0) {
>>+                    // for some reason this connection pool didn't already exist
>>+                    LOG.error("connection pool not found for: " 
>>+                              + connectionConfiguration);
>>+                    numConnections = 1;
>>+                }
>>+                notify();
>>+            }
>>+
>>         }
>>     }
>> 
>>@@ -388,8 +528,9 @@
>>                         HostConfiguration config = (HostConfiguration)
>>                             referenceToHostConfig.get(ref);
>>                         referenceToHostConfig.remove(ref);
>>-                        HostConnectionPool connectionPool = getConnectionPool(config);
>>                         synchronized (connectionPool) {
>>+                            HostConnectionPool hostPool = connectionPool.getHostPool(config);
>>+                            hostPool.numConnections--;
>>                             connectionPool.numConnections--;
>>                             connectionPool.notify();
>>                         }
>>
>>
>>
>>------------------------------------------------------------------------
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: commons-httpclient-dev-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: commons-httpclient-dev-help@jakarta.apache.org


Re: CRLF and Connection: close

Posted by "Carl A. Dunham" <ht...@strategyforward.com>.
Mike,

I did a fairly rough version of your first suggestion. It basically adds a 
list of free connections across hosts, and keeps a seperate counter. It seems 
to be working pretty well, although it should probably use a smarter 
algorithm for determining which connection to close, like the one least often 
used, and with no one waiting to get it, etc.

This *might* be useable for both types of uses, although it would suffer a 
slight performance hit for users who didn't care about the overall number of 
connections, because now so much is synchronizing on the global 
ConnectionPool.

Anyway, thanks for the help, everyone. Both of my original problems seem to be  
well on their way to being non-problems!

Next stop, cookies...


On Thursday March 20 2003 23:33, Michael Becke wrote:

> The problem, as other have been hinting at, is that connections never
> get destroyed.  By default, a max of two connections are created per
> HostConfiguration.  If you are never connecting to the same host more
> than once this is a bit of a problem.  The
> MultiThreadedHttpConnectionManager was designed to reuse connections
> for a small number of hosts.  So as you have guessed the connection
> manager will continue to create connections to new hosts and will never
> reclaim the old ones.  We have a couple of options for fixing this,
> here are a few:
>
> - create a cap on the max number of connections that can be created.
> once reached unused connections will be reclaimed
> - implement some kind of connection cache/timeout.  only connections
> that have been used in the last X milliseconds will be kept
> - implement a new type of connection manager that better fits this kind
> of process.  in particular it would focus less on connections per host,
> but more on total consecutive connections.  in general we could
> introduce a whole new set of connection managers they are optimized for
> different use scenarios
>
> These are just a few ideas.  What do others think?
>
> Mike
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> commons-httpclient-dev-unsubscribe@jakarta.apache.org For additional
> commands, e-mail: commons-httpclient-dev-help@jakarta.apache.org

Re: CRLF and Connection: close

Posted by Eric Johnson <er...@tibco.com>.
Mike,

I like the idea of a cap on the total number of connections as a 
configurable default.

Perhaps HttpClient doesn't need to implicitly perform connection 
recycling on idle connections, but how about adding an explicit method 
on MultiThreadedConnectionManager that clients can call - something to 
the effect of "closeConnectionsOlderThan(...)".  Then clients can call 
invoke the cleanup in the way that they desire.  I cringe at the idea of 
adding a thread just to implicitly perform the close.  I suppose, as an 
alternative, we could simply look for idle threads whenever a connection 
gets returned to the pool, but that might be too blunt an instrument for 
Carl.

One last thought that occurred to me - what if we added the ability to 
"observe" the MultiThreadedConnectionManager, a 
"ConnectionPoolObserver"?  Then clients could use that as an opportunity 
to configure the behavior.  Several possible functions we could put on that:
- newConnectionCreated(connectionmanager, server)
- connectionReused(connectionmanager, server, numIdle, numInUse)
- connectionRecycled(connectionmanager, server, numIdle, numInUse)

Then clients of the manager could "control" the behavior without having 
to write a completely new implementation.  A client could, for example, 
decide that one particular server was allowed to have many connections, 
but restrict all other servers to a maximum of two.

-Elric.

Michael Becke wrote:

>> The second thing has to do with how Keep-alive connections behave. 
>> This is a
>> multi-threaded app, using MultiThreadedHttpConnectionManager. It 
>> works great,
>> however I don't get much benefit of the shared Connections, because 
>> I'm not
>> connecting to the same site more than once, generally. That's OK, the 
>> problem
>> I run into is that after running for not very long, I suddenly start 
>> getting
>> everything timing out. It's hard to really pinpoint the timing, 
>> giving all the
>> activity, and no thread identifiers in the log messages, but I think 
>> what is
>> happening is that the system is simply running out of file handles or
>> system-level connections. A quick "netstat -n" shows a whole bunch of 
>> open,
>> TIME_WAIT, and other connections. It seems that the Connection 
>> Manager is
>> keeping them around for re-use, and following HTTP/1.1. One fix was 
>> to send
>> "Connection: close" as a RequestHeader, which really fixed things up, 
>> but now
>> I am running into sites that are not responding, and not timing out. 
>> The log
>> traces into ReadRawLine() and just sits there. I am still tracking 
>> this down,
>> I just wonder if anyone else has seen this also?
>
>
> The problem, as other have been hinting at, is that connections never 
> get destroyed.  By default, a max of two connections are created per 
> HostConfiguration.  If you are never connecting to the same host more 
> than once this is a bit of a problem.  The 
> MultiThreadedHttpConnectionManager was designed to reuse connections 
> for a small number of hosts.  So as you have guessed the connection 
> manager will continue to create connections to new hosts and will 
> never reclaim the old ones.  We have a couple of options for fixing 
> this, here are a few:
>
> - create a cap on the max number of connections that can be created.  
> once reached unused connections will be reclaimed
> - implement some kind of connection cache/timeout.  only connections 
> that have been used in the last X milliseconds will be kept
> - implement a new type of connection manager that better fits this 
> kind of process.  in particular it would focus less on connections per 
> host, but more on total consecutive connections.  in general we could 
> introduce a whole new set of connection managers they are optimized 
> for different use scenarios
>
> These are just a few ideas.  What do others think?
>
> Mike
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: 
> commons-httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: 
> commons-httpclient-dev-help@jakarta.apache.org
>


Re: CRLF and Connection: close

Posted by Michael Becke <be...@u.washington.edu>.
> The second thing has to do with how Keep-alive connections behave. 
> This is a
> multi-threaded app, using MultiThreadedHttpConnectionManager. It works 
> great,
> however I don't get much benefit of the shared Connections, because 
> I'm not
> connecting to the same site more than once, generally. That's OK, the 
> problem
> I run into is that after running for not very long, I suddenly start 
> getting
> everything timing out. It's hard to really pinpoint the timing, giving 
> all the
> activity, and no thread identifiers in the log messages, but I think 
> what is
> happening is that the system is simply running out of file handles or
> system-level connections. A quick "netstat -n" shows a whole bunch of 
> open,
> TIME_WAIT, and other connections. It seems that the Connection Manager 
> is
> keeping them around for re-use, and following HTTP/1.1. One fix was to 
> send
> "Connection: close" as a RequestHeader, which really fixed things up, 
> but now
> I am running into sites that are not responding, and not timing out. 
> The log
> traces into ReadRawLine() and just sits there. I am still tracking 
> this down,
> I just wonder if anyone else has seen this also?

The problem, as other have been hinting at, is that connections never 
get destroyed.  By default, a max of two connections are created per 
HostConfiguration.  If you are never connecting to the same host more 
than once this is a bit of a problem.  The 
MultiThreadedHttpConnectionManager was designed to reuse connections 
for a small number of hosts.  So as you have guessed the connection 
manager will continue to create connections to new hosts and will never 
reclaim the old ones.  We have a couple of options for fixing this, 
here are a few:

- create a cap on the max number of connections that can be created.  
once reached unused connections will be reclaimed
- implement some kind of connection cache/timeout.  only connections 
that have been used in the last X milliseconds will be kept
- implement a new type of connection manager that better fits this kind 
of process.  in particular it would focus less on connections per host, 
but more on total consecutive connections.  in general we could 
introduce a whole new set of connection managers they are optimized for 
different use scenarios

These are just a few ideas.  What do others think?

Mike