You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Spencer Lee <sp...@gmail.com> on 2006/08/01 19:05:50 UTC

Unable to retreive all data of a large (>30 MB) file.

Hello,

Just wondering if anyone has had problems retreiving data all the data of a
large file.

Here's basically the code I've been running...

...
            HttpClient httpclient = new HttpClient(params);
            httpclient.getHostConfiguration().setHost("myhost", 80, "http");
            httpclient.getParams().setCookiePolicy(
CookiePolicy.BROWSER_COMPATIBILITY);
            httpclient.getHostConfiguration ().setProxy("webproxy", 80);
...
            // Create a method instance.
            GetMethod method = new GetMethod(myurl);
            DefaultHttpMethodRetryHandler retryhandler = new
DefaultHttpMethodRetryHandler(10, false);
            HttpMethodParams method_params = method.getParams();
            method_params.setParameter(HttpMethodParams.RETRY_HANDLER,
retryhandler);
...
            // executing the method
            int statusCode = httpclient.executeMethod(method);

            if(statusCode != HttpStatus.SC_OK) {
                System.out.println("ERROR: Status code " + statusCode);
                return;
            }

            // Read the response body.
            InputStream inpStream = method.getResponseBodyAsStream();

            BufferedInputStream bufinstrm = new
BufferedInputStream(inpStream);
            long totalBytesRead = 0;
            while(true) {
                byte[] buffer = new byte[12*1024];
                int bytesRead;

                bytesRead = bufinstrm.read(buffer);
                if(bytesRead == -1) {
                    System.err.println("Read -1.  Read complete.  Total
Bytes Read: " + totalBytesRead + " bytes");
                    break;
                }

    //            bufotstrm.write(buffer, 0, bytesRead);
                totalBytesRead += bytesRead;
                System.err.println("Total Bytes Read: " + totalBytesRead + "
bytes");
            }

            bufinstrm.close();

            // close
            method.releaseConnection();



The byte size of the file is 36,197,082.  But I am only able to retreival
all 36,197,082 bytes once in awhile.  On most occasions my
BufferedInputStream (bufinstrm) will return a -1 to signal EOF before all
bytes are read.

eg.
  ...
  Total Bytes Read: 26796032 bytes
  Read -1.  Read complete.  Total Bytes Read: 26796032 bytes
  >

Anyone come across anything similar to this before?

Thanks!

Spencer

Also...
* I'm using release commons-httpclient-3.0.1
* I am not seeing any abnormal messages printed to the logs.
* I am working through a web proxy server.

Re: Unable to retreive all data of a large (>30 MB) file.

Posted by Julius Davies <ju...@cucbc.com>.
Hi, Spencer,

That's so strange.  I've never had any problems with HTTPClient and
large files (using version 3.0rc3 and version 2.0 - I haven't done any
"large file" testing in a while!).

I've managed to consistently download 750MB files no problem.  But I'm
not using any proxies, and my server was setup with really long
timeouts.

You could try looking in the proxy logs.  Maybe its killing the
connection because it's taking too long?

Also see if you can get the same symptoms when avoiding the proxy.  If
you just go straight through to your destination on the other side of
the proxy do you get the same problems?


yours,

Julius


On Tue, 2006-01-08 at 15:01 -0700, Spencer Lee wrote:
> Unfortunately no luck at all.  I haven't tried it yet with wget or
> curl.  But downloading from IE or Firefox seems to work without any
> problems.
> 
> Its worth noting that eliminating the use of the BufferedInputStream
> did not cause a performance hit.  I poked through the code a bit and
> it looks like the returned inputStreams are already wrapped as a
> BufferedInputStream. 
> 
> At this point I've decided to just not use http-client and try to put
> together a quick (quicker at least) hack to retreive the data I need.
> 
> -Spencer
> 
> On 8/1/06, Julius Davies <ju...@cucbc.com> wrote:
>         No problem!
>         
>         Any luck?  I'm especially curious if you see the same
>         behaviour with
>         "curl" or "wget".
>         
>         yours,
>         
>         Julius
>         
>         
>         On Tue, 2006-01-08 at 10:36 -0700, Spencer Lee wrote: 
>         > Thank you much for the comments.   Style comments are always
>         much
>         > appreciated!  Anyhow, I'll let you know if your suggestions
>         affect the
>         > outcome (particularly the (non)use of the buffered stream). 
>         >
>         > Regards,
>         > Spencer
>         >
>         > On 8/1/06, Julius Davies <ju...@cucbc.com> wrote:
>         >         Hi, Spencer,
>         >
>         >         Your code looks okay to me.  If you use a regular
>         web-browser, 
>         >         with the
>         >         same proxy settings, can it always download the
>         file?  Also
>         >         try with
>         >         "curl" or "wget" on the command line.
>         >
>         >         I have some code style nit picking below, but I
>         don't think it 
>         >         will make
>         >         any difference.  Good work on using
>         >         InputStream.read( byte[] )!  It
>         >         performs so much better than read() read() read()
>         read()....!
>         >
>         >         ------------------------------------- 
>         >
>         >         I would do this before the "while( true )" rather
>         than inside
>         >         it:
>         >
>         >         byte[] buffer = new byte[12*1024];
>         >
>         >         I would try testing a few times without the 
>         >         "BufferedInputStream", but
>         >         just out of curiosity.  BufferedInputStream tends to
>         help a
>         >         lot, but
>         >         since you're already using
>         InputStream.read( byte[]  ), 
>         >         BufferedInputStream isn't going to be quite as
>         amazing as he
>         >         usually is.
>         >
>         >         Concerning code style, I think this looks cleaner
>         (sorry to be
>         >         so
>         >         obnoxious!): 
>         >
>         >         long totalBytesRead = 0;
>         >         try
>         >         {
>         >           byte[] buf = new byte[ 4096 ];
>         >           int r = in.read( buf );
>         >           while ( r >= 0 )
>         >           { 
>         >             if ( r > 0 )
>         >             {
>         >               totalBytesRead += r;
>         >               out.write( buf, 0, r );
>         >             }
>         >             r = in.read( buf );
>         >           } 
>         >         }
>         >         finally
>         >         {
>         >           try { if ( in != null ) in.close(); } catch
>         ( IOException
>         >         ioe ) {}
>         >           try { if ( out != null ) out.close(); } catch
>         ( IOException 
>         >         ioe ) {}
>         >         }
>         >
>         >
>         >         yours,
>         >
>         >         Julius
>         >
>         >
>         >         On Tue, 2006-01-08 at 10:05 -0700, Spencer Lee
>         wrote:
>         >         > Hello, 
>         >         >
>         >         > Just wondering if anyone has had problems
>         retreiving data
>         >         all the data of a
>         >         > large file.
>         >         >
>         >         > Here's basically the code I've been running... 
>         >         >
>         >         > ...
>         >         >             HttpClient httpclient = new
>         HttpClient(params);
>         >         >
>         >         httpclient.getHostConfiguration().setHost("myhost",
>         80, 
>         >         "http");
>         >         >
>         httpclient.getParams().setCookiePolicy(
>         >         > CookiePolicy.BROWSER_COMPATIBILITY);
>         >         >             httpclient.getHostConfiguration 
>         >         ().setProxy("webproxy", 80);
>         >         > ...
>         >         >             // Create a method instance.
>         >         >             GetMethod method = new
>         GetMethod(myurl);
>         >         >             DefaultHttpMethodRetryHandler
>         retryhandler = new 
>         >         > DefaultHttpMethodRetryHandler(10, false);
>         >         >             HttpMethodParams method_params =
>         >         method.getParams();
>         >         >
>         >
>         method_params.setParameter( HttpMethodParams.RETRY_HANDLER,
>         >         > retryhandler);
>         >         > ...
>         >         >             // executing the method
>         >         >             int statusCode =
>         httpclient.executeMethod 
>         >         (method);
>         >         >
>         >         >             if(statusCode != HttpStatus.SC_OK) {
>         >         >                 System.out.println("ERROR: Status
>         code " +
>         >         statusCode); 
>         >         >                 return;
>         >         >             }
>         >         >
>         >         >             // Read the response body.
>         >         >             InputStream inpStream =
>         >         method.getResponseBodyAsStream();
>         >         >
>         >         >             BufferedInputStream bufinstrm = new
>         >         > BufferedInputStream(inpStream);
>         >         >             long totalBytesRead = 0; 
>         >         >             while(true) {
>         >         >                 byte[] buffer = new byte[12*1024];
>         >         >                 int bytesRead;
>         >         >
>         >         >                 bytesRead =
>         bufinstrm.read(buffer);
>         >         >                 if(bytesRead == -1) {
>         >         >                     System.err.println("Read
>         -1.  Read
>         >         complete.  Total
>         >         > Bytes Read: " + totalBytesRead + " bytes"); 
>         >         >                     break;
>         >         >                 }
>         >         >
>         >         >     //            bufotstrm.write(buffer, 0,
>         bytesRead);
>         >         >                 totalBytesRead += bytesRead; 
>         >         >                 System.err.println ("Total Bytes
>         Read: " +
>         >         totalBytesRead + "
>         >         > bytes");
>         >         >             }
>         >         > 
>         >         >             bufinstrm.close();
>         >         >
>         >         >             // close
>         >         >             method.releaseConnection ();
>         >         >
>         >         >
>         >         >
>         >         > The byte size of the file is 36,197,082.  But I am
>         only able
>         >         to retreival
>         >         > all 36,197,082 bytes once in awhile.  On most
>         occasions my
>         >         > BufferedInputStream (bufinstrm) will return a -1
>         to signal 
>         >         EOF before all
>         >         > bytes are read.
>         >         >
>         >         > eg.
>         >         >   ...
>         >         >   Total Bytes Read: 26796032 bytes
>         >         >   Read -1.  Read complete.  Total Bytes Read:
>         26796032 bytes 
>         >         >   >
>         >         >
>         >         > Anyone come across anything similar to this
>         before?
>         >         >
>         >         > Thanks!
>         >         >
>         >         > Spencer 
>         >         >
>         >         > Also...
>         >         > * I'm using release commons-httpclient-3.0.1
>         >         > * I am not seeing any abnormal messages printed to
>         the logs.
>         >         > * I am working through a web proxy server. 
>         >         --
>         >         Julius Davies
>         >         Senior Application Developer, Technology Services
>         >         Credit Union Central of British Columbia
>         >         http://www.cucbc.com/
>         >         Tel: 604-730-6385
>         >         Cel: 604-868-7571
>         >         Fax: 604-737-5910
>         >
>         >         1441 Creekside Drive
>         >         Vancouver, BC
>         >         Canada 
>         >         V6J 4S7
>         >
>         >


-- 
Julius Davies
Senior Application Developer, Technology Services
Credit Union Central of British Columbia
http://www.cucbc.com/
Tel: 604-730-6385
Cel: 604-868-7571
Fax: 604-737-5910

1441 Creekside Drive
Vancouver, BC
Canada
V6J 4S7

http://juliusdavies.ca/


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: Unable to retreive all data of a large (>30 MB) file.

Posted by Julius Davies <ju...@cucbc.com>.
Hi, Spencer,

Your code looks okay to me.  If you use a regular web-browser, with the
same proxy settings, can it always download the file?  Also try with
"curl" or "wget" on the command line.

I have some code style nit picking below, but I don't think it will make
any difference.  Good work on using InputStream.read( byte[] )!  It
performs so much better than read() read() read() read()....!

-------------------------------------

I would do this before the "while( true )" rather than inside it:

byte[] buffer = new byte[12*1024];

I would try testing a few times without the "BufferedInputStream", but
just out of curiosity.  BufferedInputStream tends to help a lot, but
since you're already using InputStream.read( byte[]  ),
BufferedInputStream isn't going to be quite as amazing as he usually is.

Concerning code style, I think this looks cleaner (sorry to be so
obnoxious!):

long totalBytesRead = 0;
try
{
  byte[] buf = new byte[ 4096 ];
  int r = in.read( buf );
  while ( r >= 0 )
  {
    if ( r > 0 )
    {
      totalBytesRead += r;
      out.write( buf, 0, r );
    }
    r = in.read( buf );
  }
}
finally
{
  try { if ( in != null ) in.close(); } catch ( IOException ioe ) {}
  try { if ( out != null ) out.close(); } catch ( IOException ioe ) {}
}


yours,

Julius


On Tue, 2006-01-08 at 10:05 -0700, Spencer Lee wrote:
> Hello,
> 
> Just wondering if anyone has had problems retreiving data all the data of a
> large file.
> 
> Here's basically the code I've been running...
> 
> ...
>             HttpClient httpclient = new HttpClient(params);
>             httpclient.getHostConfiguration().setHost("myhost", 80, "http");
>             httpclient.getParams().setCookiePolicy(
> CookiePolicy.BROWSER_COMPATIBILITY);
>             httpclient.getHostConfiguration ().setProxy("webproxy", 80);
> ...
>             // Create a method instance.
>             GetMethod method = new GetMethod(myurl);
>             DefaultHttpMethodRetryHandler retryhandler = new
> DefaultHttpMethodRetryHandler(10, false);
>             HttpMethodParams method_params = method.getParams();
>             method_params.setParameter(HttpMethodParams.RETRY_HANDLER,
> retryhandler);
> ...
>             // executing the method
>             int statusCode = httpclient.executeMethod(method);
> 
>             if(statusCode != HttpStatus.SC_OK) {
>                 System.out.println("ERROR: Status code " + statusCode);
>                 return;
>             }
> 
>             // Read the response body.
>             InputStream inpStream = method.getResponseBodyAsStream();
> 
>             BufferedInputStream bufinstrm = new
> BufferedInputStream(inpStream);
>             long totalBytesRead = 0;
>             while(true) {
>                 byte[] buffer = new byte[12*1024];
>                 int bytesRead;
> 
>                 bytesRead = bufinstrm.read(buffer);
>                 if(bytesRead == -1) {
>                     System.err.println("Read -1.  Read complete.  Total
> Bytes Read: " + totalBytesRead + " bytes");
>                     break;
>                 }
> 
>     //            bufotstrm.write(buffer, 0, bytesRead);
>                 totalBytesRead += bytesRead;
>                 System.err.println("Total Bytes Read: " + totalBytesRead + "
> bytes");
>             }
> 
>             bufinstrm.close();
> 
>             // close
>             method.releaseConnection();
> 
> 
> 
> The byte size of the file is 36,197,082.  But I am only able to retreival
> all 36,197,082 bytes once in awhile.  On most occasions my
> BufferedInputStream (bufinstrm) will return a -1 to signal EOF before all
> bytes are read.
> 
> eg.
>   ...
>   Total Bytes Read: 26796032 bytes
>   Read -1.  Read complete.  Total Bytes Read: 26796032 bytes
>   >
> 
> Anyone come across anything similar to this before?
> 
> Thanks!
> 
> Spencer
> 
> Also...
> * I'm using release commons-httpclient-3.0.1
> * I am not seeing any abnormal messages printed to the logs.
> * I am working through a web proxy server.
-- 
Julius Davies
Senior Application Developer, Technology Services
Credit Union Central of British Columbia
http://www.cucbc.com/
Tel: 604-730-6385
Cel: 604-868-7571
Fax: 604-737-5910

1441 Creekside Drive
Vancouver, BC
Canada
V6J 4S7

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org