You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Gary D. Gregory (Jira)" <ji...@apache.org> on 2021/07/15 19:58:00 UTC
[jira] [Resolved] (VFS-805) HTTP seek always exhausts response
[ https://issues.apache.org/jira/browse/VFS-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gary D. Gregory resolved VFS-805.
---------------------------------
Fix Version/s: 2.9.0
Resolution: Fixed
> HTTP seek always exhausts response
> ----------------------------------
>
> Key: VFS-805
> URL: https://issues.apache.org/jira/browse/VFS-805
> Project: Commons VFS
> Issue Type: Bug
> Affects Versions: 2.8.0
> Reporter: Claus Stadler
> Priority: Major
> Fix For: 2.9.0
>
>
> Seeking on an HTTP resource always downloads ALL content if a Content-Length header is present. The problem is that seeking closes the current input stream which eventually ends up in ContentLengthInputStream.close() of the (ancient) http client library.
>
> To be clear, the problem is actually not with the seek itself, but with the underlying close implementation that always exhausts the HTTP response body. See the example below.
>
> My use case is to perform binary search on sorted datasets on the Web (RDF data in sorted ntriple syntax) - the binary search works locally and *in principle* works on HTTP resources abstracted with VFS2, but the seek implementation that downloads *ALL* data (in my case several GBs) unfortunately defeats the purpose :(
>
> From org.apache.commons.httpclient.ContentLengthInputStream (commons-httpclient-3.1):
> {code:java}
> public void close() throws IOException {
> if (!closed) {
> try {
> ChunkedInputStream.exhaustInputStream(this);
> } finally {
> // close after above so that we don't throw an exception trying
> // to read after closed!
> closed = true;
> }
> }
> }
> {code}
> Example:
> {code:java}
> public static void main(String[] args) throws Exception {
> String url = "http://localhost/large-file-2gb.txt";
> FileSystemManager fsManager = VFS.getManager();
>
> try (FileObject file = fsManager.resolveFile(url)) {
> try (RandomAccessContent r = file.getContent().getRandomAccessContent(RandomAccessMode.READ)) {
>
> StopWatch sw1 = StopWatch.createStarted();
> r.seek(20);
> System.out.println("Initial seek: " + sw1.getTime(TimeUnit.MILLISECONDS));
> StopWatch sw2 = StopWatch.createStarted();
> byte[] bytes = new byte[100];
> r.readFully(bytes);
> System.out.println("Read: " + sw2.getTime(TimeUnit.MILLISECONDS));
>
> StopWatch sw3 = StopWatch.createStarted();
> r.seek(100);
> System.out.println("Subsequent seek: " + sw3.getTime(TimeUnit.MILLISECONDS));
> }
> }
> System.out.println("Done");
> }
> {code}
> Output (times in milliseconds):
> {code}
> Initial seek: 0
> Read: 4
> Subsequent seek: 2538
> Done
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)