You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Jim Roepcke <ji...@roepcke.com> on 2005/07/03 01:49:04 UTC

Making ContentLengthInputStream stop when I want, is this safe?

Hi,

I'm writing an application where the user must be able to interrupt  
and resume downloads at will.

I was noticing noticeable pauses when I called releaseConnection() on  
the GetMethod that I was reading the response from.  Recently I  
realized this is because deep in HttpClient, releaseConnection  
eventually calls exhaustInputStream to read and discard the rest of  
the response (!!!) before closing the connection.  Since the file  
downloads the user will interrupt are potentially many gigabytes in  
size, having releaseConnection block while it reads and discards many  
gigabytes of data is obviously very bad.

I created a workaround.  It seems to be working for me, ie:  
releaseConnection is no longer pausing, but I am guessing the  
developers of HttpClient had something important in mind when they  
decided to FORCE responses to be COMPLETELY read.  Could you please  
let me know if my hack is going to cause bad side-effects?

Background info: I'm using the MultiThreadedHttpConnectionManager.   
At any time I could have up to one (significant) GET download, one  
(significant) PUT upload, and possibly a very small number of very  
short DELETE, HEAD and GET requests executing.

The hack:

In ContentLengthInputStream, I added a method forceSkipTheRest().  It  
does the following:

     public void forceSkipTheRest() {
         this.contentLength = this.pos;
     }

In AutoCloseInputStream, I added an accessor method to get the  
wrapped [ContentLength]InputStream, so that I can call the  
forceSkipTheRest method on it.

Here is a snippet of code from my Download worker code (which runs in  
a separate thread managed by Doug Lea's util.concurrent's  
QueuedExectutor class).  Please refer to the section near the bottom  
inside the if (shouldStop()) block for the use of the gross hack...

// [snip]
client.executeMethod(hc, get);
int status_code_get = get.getStatusCode();

if ((status_code_get / 100) == 2) { // in the 200 series of responses
     responseStream = get.getResponseBodyAsStream();
     int download_stream_buf_size = 4096;
     // TODO: review again if this bufferedinputstream is needed
     // it seems it is redundant, the responseStream is probably good  
enough!!
     downloadStream = new BufferedInputStream(responseStream,  
download_stream_buf_size);
     byte[] downloadBuffer = new byte[download_stream_buf_size];

     int bytesRead = 0;
     raf = new RandomAccessFile(downloadFile, "rw");
     while ((bytesRead = downloadStream.read(downloadBuffer, 0,  
download_stream_buf_size)) > 0) {
         raf.write(downloadBuffer, 0, bytesRead);
         totalBytesRead += bytesRead;
         if (shouldStop()) {
             if (responseStream instanceof AutoCloseInputStream) {
                 InputStream stream = ((AutoCloseInputStream) 
responseStream).getWrappedInputStream();
                 if (stream instanceof ContentLengthInputStream)
                     ((ContentLengthInputStream) 
stream).forceSkipTheRest();
             }
             break;
         }
     }
     get.releaseConnection();
     get = null;
     // [snip]

In some quick testing, it seems this is working just fine... however,  
I wonder if any bad side effects will result from this.

I'm also wondering if perhaps I should simply set a flag in  
ContentLengthInputStream, and if that flag is true, when close() is  
called on ContentLegnthInputStream it calls forceSkipTheRest before  
calling exhaustInputStream (so that this interruption works the way I  
need in all cases, not just in my read loop).

Your advice is *greatly* appreciated!!

Thanks,

Jim


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: Making ContentLengthInputStream stop when I want, is this safe?

Posted by Jim Roepcke <ji...@roepcke.com>.
On 2-Jul-05, at 6:47 PM, Jim Roepcke wrote:

> I did try get.abort() before devising the workaround, but after  
> calling abort, subsequent HTTP methods weren't being executed.  ...  
> I just tried it again.  It works a few times but then eventually  
> HTTP methods just don't execute.  My guess is, since the  
> connections aren't (as far as I can tell) being released to the  
> pool when you call abort(), eventually the  
> MultiThreadedHttpConnectionManager's connection pool is starved and  
> it has nothing to offer the HttpClient.

I've done some more debugging.  I think I was right, the thread that  
calls client.execute is blocked, waiting in  
MultiThreadedHttpConnectionManager.doGetConnection (which is called  
by MultiThreadedHttpConnectionManager .getConnectionWithTimeout,  
called by HttpMethodDirector.executeMethod).

The timeout value would appear to be the default of 0 (since I didn't  
configure that in any way), which appears to mean no timeout, which  
explains why the QueuedExecutor handling my transfer commands is  
blocked.

 From the looks of things (please correct me if I'm wrong) even if I  
set a timeout value it would just result in client.execute method  
throwing a connection pool timeout exception instead of having the  
connection pool re-stock itself with new connections.

Is this the intended behavior of abort(), or should it be notifying  
the connection manager so it can clean up and keep the connection  
pool stocked?

> Looking at abort(), it calls close() on the connection, but not  
> releaseConnection(), which would in turn call  
> httpConnectionManager.releaseConnection(this).  I assume this is  
> why you say it won't be re-used?  On a quick inspection, I don't  
> see the MultiThreadedHttpConnectionManager being notified that the  
> connection is finished or unusable.  Do I have to either tell it  
> somehow or create a new connection manager?  It doesn't appear abort 
> () does anything to tell the connection manager the connection is  
> now unusable.

Thanks again for your help,

Jim



---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: Making ContentLengthInputStream stop when I want, is this safe?

Posted by Jim Roepcke <ji...@roepcke.com>.
On 2-Jul-05, at 11:25 PM, Jim Roepcke wrote:

> Ah, okay.  I thought it was releaseConnection or abort.  I was also  
> a little scared to call releaseConnection because it was doing the  
> exhaustInputStream, but I suppose if I abort first there won't be  
> anything for it to exhaust, right?
> ....snip....
> Yeah, I do have a finally statement that checks if get != null and  
> then does releaseConnection, it's outside of the try block  
> (obviously) which I didn't put in the [snip]'d code I pasted in here.
>
> I'll give this a try now, thanks again for your help!

Some preliminary testing shows it's working wonderfully!   
exhaustInputStream is still being called but there's nothing left to  
read. :)

Perfect!

I think I'm fine on the upload side, since my upload worker is  
implementing RequestEntity, and it breaks out of the write loop in  
writeRequest when the the user signals the upload to stop.

Thanks again,

Jim


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: Making ContentLengthInputStream stop when I want, is this safe?

Posted by Jim Roepcke <ji...@roepcke.com>.
On 2-Jul-05, at 8:16 PM, Michael Becke wrote:

>> Thanks for the advice. :-)  That's interesting that the content would
>> still be on the wire.  I guess that's a part of HttpClient or sockets
>> I don't understand well enough -- I assumed that when the socket was
>> closed the remaining bytes simply "could not get through", and that
>> after exhaustInputStream(), the socket would soon after be closed.
>>
>
> Connections/sockets can be reused, depending on the configuration of
> the client/server.  Releasing a connection does not necessarily close
> it.

Server is Tomcat 5.5.9, not doing anything special except making sure  
Content-Length is sent and handling Range/Content-Range headers  
properly.  One hack though is on HEAD requests when the file I'd  
serve is greater than 2GB, I can't set the Content-Length to be the  
file's size because it'll only take an int, so I set the Content- 
Length to 0 and set another header "X-Long-Content-Length" that my  
HEAD request code watches for.  I know that's terrible put I didn't  
have the time to fix it.  Coyote actually supports long content  
requests but I think something higher up in the Servlet API doesn't  
and it's next to impossible to get to the Coyote response object to  
get it to work with a long without modifying lots of Tomcat/Servlet  
2.4 code.

> Yep, sounds like you're leaking connections.  Even when using abort()
> you still need to be sure to call releaseConnection().  As a rule
> releaseConnection() needs to be called every time executeMethod() is
> called.

Ah, okay.  I thought it was releaseConnection or abort.  I was also a  
little scared to call releaseConnection because it was doing the  
exhaustInputStream, but I suppose if I abort first there won't be  
anything for it to exhaust, right?

> You should probably put releaseConnection() in a finally statement, so
> you can be sure it is always called.

Yeah, I do have a finally statement that checks if get != null and  
then does releaseConnection, it's outside of the try block  
(obviously) which I didn't put in the [snip]'d code I pasted in here.

I'll give this a try now, thanks again for your help!

Jim


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: Making ContentLengthInputStream stop when I want, is this safe?

Posted by Michael Becke <mb...@gmail.com>.
> Thanks for the advice. :-)  That's interesting that the content would
> still be on the wire.  I guess that's a part of HttpClient or sockets
> I don't understand well enough -- I assumed that when the socket was
> closed the remaining bytes simply "could not get through", and that
> after exhaustInputStream(), the socket would soon after be closed.

Connections/sockets can be reused, depending on the configuration of
the client/server.  Releasing a connection does not necessarily close
it.

> I did try get.abort() before devising the workaround, but after
> calling abort, subsequent HTTP methods weren't being executed.  ... I
> just tried it again.  It works a few times but then eventually HTTP
> methods just don't execute.  My guess is, since the connections
> aren't (as far as I can tell) being released to the pool when you
> call abort(), eventually the MultiThreadedHttpConnectionManager's
> connection pool is starved and it has nothing to offer the HttpClient.
> 
> Looking at abort(), it calls close() on the connection, but not
> releaseConnection(), which would in turn call
> httpConnectionManager.releaseConnection(this).  I assume this is why
> you say it won't be re-used?  On a quick inspection, I don't see the
> MultiThreadedHttpConnectionManager being notified that the connection
> is finished or unusable.  Do I have to either tell it somehow or
> create a new connection manager?  It doesn't appear abort() does
> anything to tell the connection manager the connection is now unusable.

Yep, sounds like you're leaking connections.  Even when using abort()
you still need to be sure to call releaseConnection().  As a rule
releaseConnection() needs to be called every time executeMethod() is
called.

> Here's how I have the read loop when I use abort():
> 
>       while ((bytesRead = downloadStream.read(downloadBuffer, 0,
> download_stream_buf_size)) > 0) {
>           raf.write(downloadBuffer, 0, bytesRead);
>           totalBytesRead += bytesRead;
>           if (shouldStop()) {
>               get.abort();
>               get = null
>               break;
>           }
>       }
>       if (get != null) {
>           get.releaseConnection();
>           get = null;
>       }
>       // [snip]
> 
> Do you seen anything wrong with that? (Assuming abort() is working
> perfectly)

You should probably put releaseConnection() in a finally statement, so
you can be sure it is always called.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: Making ContentLengthInputStream stop when I want, is this safe?

Posted by Jim Roepcke <ji...@roepcke.com>.
On 2-Jul-05, at 4:56 PM, Michael Becke wrote:

> Hi Jim,
>
> HttpMethod.abort() is probably what you're looking for.
>
> Your workaround solution could potentially cause some issues.  In
> particular it will be a problem if you end up reusing a connection
> that has been skipped.  The remaining content will still be on the
> wire and it will be read the next time the connection is used.
> Reading to the end of the response is specifically done so that
> connections can be reused.  In your case it sounds like it would be
> better to abort the method, which causes the connection to be closed
> and not reused.

Hi Michael,

Thanks for the advice. :-)  That's interesting that the content would  
still be on the wire.  I guess that's a part of HttpClient or sockets  
I don't understand well enough -- I assumed that when the socket was  
closed the remaining bytes simply "could not get through", and that  
after exhaustInputStream(), the socket would soon after be closed.

I did try get.abort() before devising the workaround, but after  
calling abort, subsequent HTTP methods weren't being executed.  ... I  
just tried it again.  It works a few times but then eventually HTTP  
methods just don't execute.  My guess is, since the connections  
aren't (as far as I can tell) being released to the pool when you  
call abort(), eventually the MultiThreadedHttpConnectionManager's  
connection pool is starved and it has nothing to offer the HttpClient.

Looking at abort(), it calls close() on the connection, but not  
releaseConnection(), which would in turn call  
httpConnectionManager.releaseConnection(this).  I assume this is why  
you say it won't be re-used?  On a quick inspection, I don't see the  
MultiThreadedHttpConnectionManager being notified that the connection  
is finished or unusable.  Do I have to either tell it somehow or  
create a new connection manager?  It doesn't appear abort() does  
anything to tell the connection manager the connection is now unusable.

Here's how I have the read loop when I use abort():

      while ((bytesRead = downloadStream.read(downloadBuffer, 0,  
download_stream_buf_size)) > 0) {
          raf.write(downloadBuffer, 0, bytesRead);
          totalBytesRead += bytesRead;
          if (shouldStop()) {
              get.abort();
              get = null
              break;
          }
      }
      if (get != null) {
          get.releaseConnection();
          get = null;
      }
      // [snip]

Do you seen anything wrong with that? (Assuming abort() is working  
perfectly)

I appreciate your advice, hopefully I can get this working properly  
using abort(), because it sounds like that is the intended device for  
my needs.

If you have any suggestions for improvements to abort, I'm very happy  
to try them here.

Thanks again,

Jim

> On 7/2/05, Jim Roepcke <ji...@roepcke.com> wrote:
>
>> Hi,
>>
>> I'm writing an application where the user must be able to interrupt
>> and resume downloads at will.
>>
>> I was noticing noticeable pauses when I called releaseConnection() on
>> the GetMethod that I was reading the response from.  Recently I
>> realized this is because deep in HttpClient, releaseConnection
>> eventually calls exhaustInputStream to read and discard the rest of
>> the response (!!!) before closing the connection.  Since the file
>> downloads the user will interrupt are potentially many gigabytes in
>> size, having releaseConnection block while it reads and discards many
>> gigabytes of data is obviously very bad.
>>
>> I created a workaround.  It seems to be working for me, ie:
>> releaseConnection is no longer pausing, but I am guessing the
>> developers of HttpClient had something important in mind when they
>> decided to FORCE responses to be COMPLETELY read.  Could you please
>> let me know if my hack is going to cause bad side-effects?
>>
>> Background info: I'm using the MultiThreadedHttpConnectionManager.
>> At any time I could have up to one (significant) GET download, one
>> (significant) PUT upload, and possibly a very small number of very
>> short DELETE, HEAD and GET requests executing.
>>
>> The hack:
>>
>> In ContentLengthInputStream, I added a method forceSkipTheRest().  It
>> does the following:
>>
>>      public void forceSkipTheRest() {
>>          this.contentLength = this.pos;
>>      }
>>
>> In AutoCloseInputStream, I added an accessor method to get the
>> wrapped [ContentLength]InputStream, so that I can call the
>> forceSkipTheRest method on it.
>>
>> Here is a snippet of code from my Download worker code (which runs in
>> a separate thread managed by Doug Lea's util.concurrent's
>> QueuedExectutor class).  Please refer to the section near the bottom
>> inside the if (shouldStop()) block for the use of the gross hack...
>>
>> // [snip]
>> client.executeMethod(hc, get);
>> int status_code_get = get.getStatusCode();
>>
>> if ((status_code_get / 100) == 2) { // in the 200 series of responses
>>      responseStream = get.getResponseBodyAsStream();
>>      int download_stream_buf_size = 4096;
>>      // TODO: review again if this bufferedinputstream is needed
>>      // it seems it is redundant, the responseStream is probably good
>> enough!!
>>      downloadStream = new BufferedInputStream(responseStream,
>> download_stream_buf_size);
>>      byte[] downloadBuffer = new byte[download_stream_buf_size];
>>
>>      int bytesRead = 0;
>>      raf = new RandomAccessFile(downloadFile, "rw");
>>      while ((bytesRead = downloadStream.read(downloadBuffer, 0,
>> download_stream_buf_size)) > 0) {
>>          raf.write(downloadBuffer, 0, bytesRead);
>>          totalBytesRead += bytesRead;
>>          if (shouldStop()) {
>>              if (responseStream instanceof AutoCloseInputStream) {
>>                  InputStream stream = ((AutoCloseInputStream)
>> responseStream).getWrappedInputStream();
>>                  if (stream instanceof ContentLengthInputStream)
>>                      ((ContentLengthInputStream)
>> stream).forceSkipTheRest();
>>              }
>>              break;
>>          }
>>      }
>>      get.releaseConnection();
>>      get = null;
>>      // [snip]
>>
>> In some quick testing, it seems this is working just fine... however,
>> I wonder if any bad side effects will result from this.
>>
>> I'm also wondering if perhaps I should simply set a flag in
>> ContentLengthInputStream, and if that flag is true, when close() is
>> called on ContentLegnthInputStream it calls forceSkipTheRest before
>> calling exhaustInputStream (so that this interruption works the way I
>> need in all cases, not just in my read loop).
>>
>> Your advice is *greatly* appreciated!!
>>
>> Thanks,
>>
>> Jim


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: Making ContentLengthInputStream stop when I want, is this safe?

Posted by Michael Becke <mb...@gmail.com>.
Hi Jim,

HttpMethod.abort() is probably what you're looking for.

Your workaround solution could potentially cause some issues.  In
particular it will be a problem if you end up reusing a connection
that has been skipped.  The remaining content will still be on the
wire and it will be read the next time the connection is used. 
Reading to the end of the response is specifically done so that
connections can be reused.  In your case it sounds like it would be
better to abort the method, which causes the connection to be closed
and not reused.

Mike

On 7/2/05, Jim Roepcke <ji...@roepcke.com> wrote:
> Hi,
> 
> I'm writing an application where the user must be able to interrupt
> and resume downloads at will.
> 
> I was noticing noticeable pauses when I called releaseConnection() on
> the GetMethod that I was reading the response from.  Recently I
> realized this is because deep in HttpClient, releaseConnection
> eventually calls exhaustInputStream to read and discard the rest of
> the response (!!!) before closing the connection.  Since the file
> downloads the user will interrupt are potentially many gigabytes in
> size, having releaseConnection block while it reads and discards many
> gigabytes of data is obviously very bad.
> 
> I created a workaround.  It seems to be working for me, ie:
> releaseConnection is no longer pausing, but I am guessing the
> developers of HttpClient had something important in mind when they
> decided to FORCE responses to be COMPLETELY read.  Could you please
> let me know if my hack is going to cause bad side-effects?
> 
> Background info: I'm using the MultiThreadedHttpConnectionManager.
> At any time I could have up to one (significant) GET download, one
> (significant) PUT upload, and possibly a very small number of very
> short DELETE, HEAD and GET requests executing.
> 
> The hack:
> 
> In ContentLengthInputStream, I added a method forceSkipTheRest().  It
> does the following:
> 
>      public void forceSkipTheRest() {
>          this.contentLength = this.pos;
>      }
> 
> In AutoCloseInputStream, I added an accessor method to get the
> wrapped [ContentLength]InputStream, so that I can call the
> forceSkipTheRest method on it.
> 
> Here is a snippet of code from my Download worker code (which runs in
> a separate thread managed by Doug Lea's util.concurrent's
> QueuedExectutor class).  Please refer to the section near the bottom
> inside the if (shouldStop()) block for the use of the gross hack...
> 
> // [snip]
> client.executeMethod(hc, get);
> int status_code_get = get.getStatusCode();
> 
> if ((status_code_get / 100) == 2) { // in the 200 series of responses
>      responseStream = get.getResponseBodyAsStream();
>      int download_stream_buf_size = 4096;
>      // TODO: review again if this bufferedinputstream is needed
>      // it seems it is redundant, the responseStream is probably good
> enough!!
>      downloadStream = new BufferedInputStream(responseStream,
> download_stream_buf_size);
>      byte[] downloadBuffer = new byte[download_stream_buf_size];
> 
>      int bytesRead = 0;
>      raf = new RandomAccessFile(downloadFile, "rw");
>      while ((bytesRead = downloadStream.read(downloadBuffer, 0,
> download_stream_buf_size)) > 0) {
>          raf.write(downloadBuffer, 0, bytesRead);
>          totalBytesRead += bytesRead;
>          if (shouldStop()) {
>              if (responseStream instanceof AutoCloseInputStream) {
>                  InputStream stream = ((AutoCloseInputStream)
> responseStream).getWrappedInputStream();
>                  if (stream instanceof ContentLengthInputStream)
>                      ((ContentLengthInputStream)
> stream).forceSkipTheRest();
>              }
>              break;
>          }
>      }
>      get.releaseConnection();
>      get = null;
>      // [snip]
> 
> In some quick testing, it seems this is working just fine... however,
> I wonder if any bad side effects will result from this.
> 
> I'm also wondering if perhaps I should simply set a flag in
> ContentLengthInputStream, and if that flag is true, when close() is
> called on ContentLegnthInputStream it calls forceSkipTheRest before
> calling exhaustInputStream (so that this interruption works the way I
> need in all cases, not just in my read loop).
> 
> Your advice is *greatly* appreciated!!
> 
> Thanks,
> 
> Jim
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org