You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Claus Stadler (Jira)" <ji...@apache.org> on 2021/06/07 13:52:00 UTC

[jira] [Commented] (VFS-805) HTTP seek always exhausts response

    [ https://issues.apache.org/jira/browse/VFS-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17358615#comment-17358615 ] 

Claus Stadler commented on VFS-805:
-----------------------------------

So would it be possible to add changes in the sprit of the following?

{code:java}
public class MonitorInputStream extends BufferedInputStream {
  @Override
  public void close() throws IOException {
...
      try {
        closeSuper(); // Introduce a method that allows for overriding 'super.close()'
      } catch (final IOException ioe) 
...
  }

  protected void closeSuper() {
    super.close();
  }
}

final class MonitoredHttpResponseContentInputStream extends MonitorInputStream {
  /**
   * If the http response is closeable then close the response rather than the underlying stream
   * in order to avoid potentially consuming all remaining data
   */
  @Override
  protected void closeSuper() {
    if (!(httpResponse instanceof CloseableHttpResponse)) { // Or just 'Closeable'
      super.closeSuper();
    }
  }

  // onClose remains unchanged:
  @Override
  protected void onClose() throws IOException {
    if (httpResponse instanceof CloseableHttpResponse) {
      ((CloseableHttpResponse) httpResponse).close();
    }
  }
}

{code} 

 

 

> HTTP seek always exhausts response
> ----------------------------------
>
>                 Key: VFS-805
>                 URL: https://issues.apache.org/jira/browse/VFS-805
>             Project: Commons VFS
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Claus Stadler
>            Priority: Major
>
> Seeking on an HTTP resource always downloads ALL content if a Content-Length header is present. The problem is that seeking closes the current input stream which eventually ends up in ContentLengthInputStream.close() of the (ancient) http client library.
>  
> To be clear, the problem is actually not with the seek itself, but with the underlying close implementation that always exhausts the HTTP response body. See the example below.
>  
> My use case is to perform binary search on sorted datasets on the Web (RDF data in sorted ntriple syntax) - the binary search works locally and *in principle* works on HTTP resources abstracted with VFS2, but the seek implementation that downloads *ALL* data (in my case several GBs) unfortunately defeats the purpose :(
>  
> From org.apache.commons.httpclient.ContentLengthInputStream (commons-httpclient-3.1):
> {code:java}
>     public void close() throws IOException {
>         if (!closed) {
>             try {
>                 ChunkedInputStream.exhaustInputStream(this);
>             } finally {
>                 // close after above so that we don't throw an exception trying
>                 // to read after closed!
>                 closed = true;
>             }
>         }
>     }
> {code}
> Example:
> {code:java}
> 	public static void main(String[] args) throws Exception {
> 		String url = "http://localhost/large-file-2gb.txt";
> 		FileSystemManager fsManager = VFS.getManager();
> 		
> 		try (FileObject file = fsManager.resolveFile(url)) {	
> 			try (RandomAccessContent r = file.getContent().getRandomAccessContent(RandomAccessMode.READ)) {
> 				
> 				StopWatch sw1 = StopWatch.createStarted();
> 				r.seek(20);
> 				System.out.println("Initial seek: " + sw1.getTime(TimeUnit.MILLISECONDS));
> 				StopWatch sw2 = StopWatch.createStarted();
> 				byte[] bytes = new byte[100];
> 				r.readFully(bytes);
> 				System.out.println("Read: " + sw2.getTime(TimeUnit.MILLISECONDS));
> 				
> 				StopWatch sw3 = StopWatch.createStarted();
> 				r.seek(100);
> 				System.out.println("Subsequent seek: " + sw3.getTime(TimeUnit.MILLISECONDS));
> 			}
> 		}
> 		System.out.println("Done");
> 	}
> {code}
> Output (times in milliseconds):
> {code}
> Initial seek: 0
> Read: 4
> Subsequent seek: 2538
> Done
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)