You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Paranoid <pa...@ukr.net> on 2006/11/08 15:16:48 UTC

have question about buffer size

have src file in attachment.
description: creating buffer with 20 MB size. reading and appending readed size into StringBuilder. after reading - show StringBuilder contents. on my machine with Mustang b101 we read about 1440 
bytes every time, instead of read 20 MB. have great perfomance problem and need REAL buffer, but dont know what to do...

Re: handle the request entities from server to client ?

Posted by Roland Weber <ht...@dubioso.net>.
Hello Bastian,

> The following code worked
> fine, but I am not sure if
> it is a 'best practice'.

It is not. See my comments inline.

>     private void requestMessages(
>             String[] messages, HttpServletResponse response) {
> 
>         StringRequestEntity s1;
>         OutputStream out;
>         
>         response.setContentType("application/octet-stream");

application/octet-stream indicates binary data. That doesn't fit
with using a StringRequestEntity. As it seems that you are really
sending strings, consider changing the content type to text/plain.

> 
>         try {
>             // SEPERATOR is a static final String
>             s1 = new StringRequestEntity(
>                     Integer.toString(messages.length) + SEPERATOR);
>             out = response.getOutputStream();
>             s1.writeRequest(out);
>             
>             for (int i = 0; i < messages.length; i++) {
>                 
>                 s1 = new StringRequestEntity(messages[i] + SEPERATOR);    
>                 out = response.getOutputStream();
>                 s1.writeRequest(out);    
>             }

Well, this _seems_ to work. But I really have no idea why you
would want to use the StringRequestEntity with the Servlet API.
Is HttpServletResponse.getWriter() not good enough for you?
Besides, you didn't specify the character encoding to use,
so your code will only work if client and server happen to use
the same platform default encoding. The string concatenations
are superfluous and a waste of processing resources. As is the
creation of intermediate request entities. Change to something
like this:

response.setContentType("text/plain; charset=UTF-8");
Writer w = response.getWriter();
w.write(messages.length);
w.write(SEPERATOR);
for(yadda yadda) {
  w.write(messages[i]);
  w.write(SEPARATOR);
}

>             
>         } catch (UnsupportedEncodingException exc1) {
> // ...
> 
> 
> 
> 
> ---- client source (CLIENT)----
> 
> // ...
>     * @return Servlet response as <code>String[]</code>
>      */
>     private String[] getResponse(PostMethod post) {
>         
>         InputStream is = null;
>         String data    = "";
>         String[] messages = null;
>         
>         try {
>             
>             is  = post.getResponseBodyAsStream();
>             InputStreamReader isr = new InputStreamReader(is);
>             BufferedReader in = new BufferedReader(isr);
> 
>             String line;
>             while ((line = in.readLine()) != null) {
>                 data = data + line;
>             }

Seeing pointless string concatenations like these hurts.
Please investigate the purpose of java.lang.StringBuffer.
It seems that JDK 1.5 has a non-synchronized alternative
which is even faster. By the way, why are you reading in
by lines if you're not writing by lines? And you should
set the character encoding to be used by the reader.

>             // SEPERATOR is a static final String
>             messages = data.split(SEPERATOR);
> 
>         } catch (IOException exc) {
>             // handle this later
>             exc.printStackTrace();
>         } finally {
>             post.releaseConnection();
>         }
> 
>         if (messages == null) {
>          // handle this later, throw exception
>         }
>         
>         return messages;
>     }

cheers,
  Roland

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: handle the request entities from server to client ?

Posted by Tomm Krause <da...@gmx.net>.
Hello Roland,
Thanks a lot for your answer.

The following code worked
fine, but I am not sure if
it is a 'best practice'.


---- servlet source (SERVER)----

// ...
    private void requestMessages(
            String[] messages, HttpServletResponse response) {

        StringRequestEntity s1;
        OutputStream out;
        
        response.setContentType("application/octet-stream");

        try {
            // SEPERATOR is a static final String
            s1 = new StringRequestEntity(
                    Integer.toString(messages.length) + SEPERATOR);
            out = response.getOutputStream();
            s1.writeRequest(out);
            
            for (int i = 0; i < messages.length; i++) {
                
                s1 = new StringRequestEntity(messages[i] + SEPERATOR);    
                out = response.getOutputStream();
                s1.writeRequest(out);    
            }
            
        } catch (UnsupportedEncodingException exc1) {
// ...




---- client source (CLIENT)----

// ...
    * @return Servlet response as <code>String[]</code>
     */
    private String[] getResponse(PostMethod post) {
        
        InputStream is = null;
        String data    = "";
        String[] messages = null;
        
        try {
            
            is  = post.getResponseBodyAsStream();
            InputStreamReader isr = new InputStreamReader(is);
            BufferedReader in = new BufferedReader(isr);

            String line;
            while ((line = in.readLine()) != null) {
                data = data + line;
            }

            // SEPERATOR is a static final String
            messages = data.split(SEPERATOR);

        } catch (IOException exc) {
            // handle this later
            exc.printStackTrace();
        } finally {
            post.releaseConnection();
        }

        if (messages == null) {
         // handle this later, throw exception
        }
        
        return messages;
    }


bastian



> Hello Bastian,
> 
> > Is there a little example how I can work with the
> > RequestEntity classes and a servlet.
> 
> You can't. RequestEntity in HttpClient 3.x is a client side
> interface which you can't use on the server side. HttpCore
> allows the use of RequestEntity on the server side, but it
> relies on HttpCore communication primitives which are different
> >from the Servlet API.
> You might find something to clarify the usage of the interfaces
> in the test code for HttpClient 3.x and in the sample code for
> HttpCore 4.0 respectively.
> 
> > I don't know how the data come from a servlet to
> > the HttpClient.
> 
> It is sent over a TCP/IP or SSL connection. The same
> connection over which the request was sent to the server.
> 
> > How will the server wrap the data and how can the
> > client unwrap it ?
> 
> The server writes it's "wrapping" strategy in the
> Transport-Encoding header. The usual options are "id"
> (no header) and "chunked". HttpClient automatically
> interprets the Transport-Encoding header and decodes the
> chunked encoding if necessary. On the client side, you'll
> get the stream as it is sent by the server. Except if
> there is some proxy inbetween that applies a content
> encoding. If that is the case, the Content-Encoding
> header will tell you what encoding has been applied.
> 
> hope that helps,
>   Roland
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: httpclient-user-help@jakarta.apache.org

-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: handle the request entities from server to client ?

Posted by Roland Weber <RO...@de.ibm.com>.
Hello Bastian,

> Is there a little example how I can work with the
> RequestEntity classes and a servlet.

You can't. RequestEntity in HttpClient 3.x is a client side
interface which you can't use on the server side. HttpCore
allows the use of RequestEntity on the server side, but it
relies on HttpCore communication primitives which are different
from the Servlet API.
You might find something to clarify the usage of the interfaces
in the test code for HttpClient 3.x and in the sample code for
HttpCore 4.0 respectively.

> I don't know how the data come from a servlet to
> the HttpClient.

It is sent over a TCP/IP or SSL connection. The same
connection over which the request was sent to the server.

> How will the server wrap the data and how can the
> client unwrap it ?

The server writes it's "wrapping" strategy in the
Transport-Encoding header. The usual options are "id"
(no header) and "chunked". HttpClient automatically
interprets the Transport-Encoding header and decodes the
chunked encoding if necessary. On the client side, you'll
get the stream as it is sent by the server. Except if
there is some proxy inbetween that applies a content
encoding. If that is the case, the Content-Encoding
header will tell you what encoding has been applied.

hope that helps,
  Roland



---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


handle the request entities from server to client ?

Posted by da...@gmx.net.
Hello,

Is there a little example how I can work with the
RequestEntity classes and a servlet.
I don't know how the data come from a servlet to
the HttpClient.
How will the server wrap the data and how can the
client unwrap it ?

Something like a ResponseEntity.


Thanks,

bastian

-- 
"Ein Herz für Kinder" - Ihre Spende hilft! Aktion: www.deutschlandsegelt.de
Unser Dankeschön: Ihr Name auf dem Segel der 1. deutschen America's Cup-Yacht!

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: have question about buffer size

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Sat, 2006-11-11 at 00:13 +0200, Paranoid wrote:
> > Mark,
> >
> > It is not worthwhile investing any efforts into optimizing it because
> > HttpClient 3.x code line will no longer be actively developed past 3.1
> > release. HttpClient 4.0 will be based on HttpCore [1][2] which has much
> > more memory efficient and performant buffering code among other things.
> > Overall I expect HttpClient 4.0 be 40 to 50% faster and by an order of
> > magnitude more memory efficient than HttpClient 3.1 due to its low lever
> > transport code (HttpCore)
> >
> > [1] http://jakarta.apache.org/httpcomponents/index.html
> > [2] http://jakarta.apache.org/httpcomponents/http-core/index.html
> >
> > Oleg
> 
> so, is it reasonable to start using HttpComponents now if application is still 
> in development state, and to expect perfomance enchancement and to report 
> bugs or it is still better to use beta of HttpClient 3.1?
> 

It depends on your priorities. If performance is the overriding concern
for your application, you may consider using HttpCore. On the other
hand, HttpCore is still ALPHA and its API is likely to change in the
future. If you chose HttpCore at this stage, you would have to go
through pains of having to adjust your application every time API
undergoes a new revision. This is probably not what you want if you are
developing commercial software. 

HttpCore is not meant to be a replacement for HttpClient. It is a set of
low level components HTTP services (client, proxy and server side) can
be build upon. Higher level components for HTTP state management
(cookies), connection management, proxy support, HTTP authentication
will be developed in HttpClient, not in HttpCore. 

To sum this up, if you need speed, flexibility, non-blocking I/O and are
interested in giving us early feedback, consider using HttpCore. If you
need stability and a rich set of client side features, stick to
HttpClient 3.x


> P.S.: little confused about will httpcomponents and httpclient be separated or 
> HttpClient 4.0 will be the last version and future development will be only 
> for httpcomponents?

HttpComponents is a project, not a product. HttpCore, HttpClient, and
potentially others are set of components developed and maintained by
HttpComponents project. 

Commons HttpClient (aka HttpClient 3.x) will no longer be enhanced. It
will be eventually superseded by Jakarta HttpClient (aka HttpClient
4.x).   

Hope this makes things somewhat clearer.

Cheers,

Oleg


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: have question about buffer size

Posted by Paranoid <pa...@ukr.net>.
> Mark,
>
> It is not worthwhile investing any efforts into optimizing it because
> HttpClient 3.x code line will no longer be actively developed past 3.1
> release. HttpClient 4.0 will be based on HttpCore [1][2] which has much
> more memory efficient and performant buffering code among other things.
> Overall I expect HttpClient 4.0 be 40 to 50% faster and by an order of
> magnitude more memory efficient than HttpClient 3.1 due to its low lever
> transport code (HttpCore)
>
> [1] http://jakarta.apache.org/httpcomponents/index.html
> [2] http://jakarta.apache.org/httpcomponents/http-core/index.html
>
> Oleg

so, is it reasonable to start using HttpComponents now if application is still 
in development state, and to expect perfomance enchancement and to report 
bugs or it is still better to use beta of HttpClient 3.1?

P.S.: little confused about will httpcomponents and httpclient be separated or 
HttpClient 4.0 will be the last version and future development will be only 
for httpcomponents?
-- 
best regards,
        Paranoid

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


RE: have question about buffer size

Posted by Mark Claassen <mc...@ocie.net>.
I am glad to hear that 4.0 is going to be even faster, but I was more
interested in figuring out what was going on.  I am actually very pleased
with the performance of HttpClient over the java.net stuff.  I have not
tested raw stream performance, but in all other ways it is certainly far
superior.

Mark
 
-----Original Message-----
From: Oleg Kalnichevski [mailto:olegk@apache.org] 
Sent: Thursday, November 09, 2006 12:23 PM
To: HttpClient User Discussion
Subject: RE: have question about buffer size

On Thu, 2006-11-09 at 11:41 -0500, Mark Claassen wrote:
> I just finished looking at that a bit, and I see that now that you do 
> read a byte at a time.  Knowing this, I agree with you.
> 
> I am now focusing on chunked input stream.  I don't really know how 
> these things work, but it seems that the chunk size is sent every time 
> (?) and you read the size and then read the data.  Looking at my chunk 
> sizes, is see that they are 8K or less.  This corresponds to the behavior
I am seeing.
> 
> I must confess that I have no idea if any optimization would be 
> possible here.  I guess you could read more than one chunk at a time, 
> but I don't know if that would help anything.
> 

Mark,

It is not worthwhile investing any efforts into optimizing it because
HttpClient 3.x code line will no longer be actively developed past 3.1
release. HttpClient 4.0 will be based on HttpCore [1][2] which has much more
memory efficient and performant buffering code among other things.
Overall I expect HttpClient 4.0 be 40 to 50% faster and by an order of
magnitude more memory efficient than HttpClient 3.1 due to its low lever
transport code (HttpCore)

[1] http://jakarta.apache.org/httpcomponents/index.html
[2] http://jakarta.apache.org/httpcomponents/http-core/index.html

Oleg

> Mark
>  
> -----Original Message-----
> From: Oleg Kalnichevski [mailto:olegk@apache.org]
> Sent: Thursday, November 09, 2006 11:22 AM
> To: HttpClient User Discussion
> Subject: RE: have question about buffer size
> 
> On Thu, 2006-11-09 at 11:00 -0500, Mark Claassen wrote:
> > I looked at the source a bit and it looks like the buffer size might 
> > be throttled.
> > 
> > I see that my base stream is an AutoCloseInputStream, which is 
> > created in HttpMethodBase.
> > Its source is received from HttpConnection.getResposeInputStream().
> > It looks like the input stream is actually created in open() where I 
> > see this code: (Version 3.01)
> > 
> > (The sndBufSize and rcvBufSize is controlled through
> > HttpConnectionParams)
> > 
> >             int outbuffersize = socket.getSendBufferSize();
> >             if ((outbuffersize > 2048) || (outbuffersize <= 0)) {
> >                 outbuffersize = 2048;
> >             }
> >             int inbuffersize = socket.getReceiveBufferSize();
> >             if ((inbuffersize > 2048) || (inbuffersize <= 0)) {
> >                 inbuffersize = 2048;
> >             }
> >             inputStream = new
> > BufferedInputStream(socket.getInputStream(),
> > inbuffersize);
> >             outputStream = new
> > BufferedOutputStream(socket.getOutputStream(), outbuffersize);
> > 
> > All this being said, even though the buffer I sent to read() is 64K 
> > I always seem to read 8192 bytes at a time.  Looking at this code, I 
> > would expect to be reading only 2048 bytes at a time.  I am still 
> > trying to figure this one out.
> > 
> 
> 
> Mark,
> 
> The way BufferedInputStream#read(byte[], int, int) is implemented 
> content looks quite intelligent to me. If there is not content in the 
> intermediate buffer BufferedInputStream will read directly from the 
> underlying input stream. So, the size of the intermediate buffer 
> should not matter a lot, when not reading content one byte at a time
> 
> Oleg
> 
> 
> > Mark
> >  
> > -----Original Message-----
> > From: Oleg Kalnichevski [mailto:olegk@apache.org]
> > Sent: Thursday, November 09, 2006 6:11 AM
> > To: HttpClient User Discussion
> > Subject: Re: have question about buffer size
> > 
> > On Wed, 2006-11-08 at 22:27 +0200, Paranoid wrote:
> > > > Why should this be a problem? Over the network, maximum segments 
> > > > you get are (after removing framing/chunking overhead) probably 
> > > > that 1440 bytes in size, and httpclient is then returning you as 
> > > > much data as it can efficiently give at any given time.
> > > > You just keep on reading all the data, piece by piece.
> > > > That's how streams work; they present an abstraction over what 
> > > > may be (and often is) packet-based transport channel.
> > > 
> > > some peaces are about 55 KB size, and first read is 75 KB. so why 
> > > are all other about 1.4 KB?
> > 
> > There can be many factors affecting the content stream fragmentation 
> > and the way it is being transferred across the wire. I suggest that 
> > you install a traffic analyzer such as Ethereal and look at the 
> > packets sent by the target server. If you are absolutely convinced 
> > the content gets fragmented somewhere inside HttpClient, I'll dig 
> > into the code and try to pinpoint the problem.
> > 
> > Oleg
> > 
> > > and after - saving inputStream into FileOutputStream load CPU for 
> > > great number... this time it is really a problem... tested with 
> > > different buffer sized, and with greater buffer size have lower 
> > > CPU
> load.
> > > P.S.: speed of downloading about 5 MB/s...
> > 
> > 
> > --------------------------------------------------------------------
> > - To unsubscribe, e-mail: 
> > httpclient-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: 
> > httpclient-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: 
> httpclient-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: 
> httpclient-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


RE: have question about buffer size

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Thu, 2006-11-09 at 11:41 -0500, Mark Claassen wrote:
> I just finished looking at that a bit, and I see that now that you do read a
> byte at a time.  Knowing this, I agree with you.
> 
> I am now focusing on chunked input stream.  I don't really know how these
> things work, but it seems that the chunk size is sent every time (?) and you
> read the size and then read the data.  Looking at my chunk sizes, is see
> that they are 8K or less.  This corresponds to the behavior I am seeing.
> 
> I must confess that I have no idea if any optimization would be possible
> here.  I guess you could read more than one chunk at a time, but I don't
> know if that would help anything.
> 

Mark,

It is not worthwhile investing any efforts into optimizing it because
HttpClient 3.x code line will no longer be actively developed past 3.1
release. HttpClient 4.0 will be based on HttpCore [1][2] which has much
more memory efficient and performant buffering code among other things.
Overall I expect HttpClient 4.0 be 40 to 50% faster and by an order of
magnitude more memory efficient than HttpClient 3.1 due to its low lever
transport code (HttpCore)

[1] http://jakarta.apache.org/httpcomponents/index.html
[2] http://jakarta.apache.org/httpcomponents/http-core/index.html

Oleg

> Mark
>  
> -----Original Message-----
> From: Oleg Kalnichevski [mailto:olegk@apache.org] 
> Sent: Thursday, November 09, 2006 11:22 AM
> To: HttpClient User Discussion
> Subject: RE: have question about buffer size
> 
> On Thu, 2006-11-09 at 11:00 -0500, Mark Claassen wrote:
> > I looked at the source a bit and it looks like the buffer size might 
> > be throttled.
> > 
> > I see that my base stream is an AutoCloseInputStream, which is created 
> > in HttpMethodBase.
> > Its source is received from HttpConnection.getResposeInputStream().
> > It looks like the input stream is actually created in open() where I 
> > see this code: (Version 3.01)
> > 
> > (The sndBufSize and rcvBufSize is controlled through 
> > HttpConnectionParams)
> > 
> >             int outbuffersize = socket.getSendBufferSize();
> >             if ((outbuffersize > 2048) || (outbuffersize <= 0)) {
> >                 outbuffersize = 2048;
> >             }
> >             int inbuffersize = socket.getReceiveBufferSize();
> >             if ((inbuffersize > 2048) || (inbuffersize <= 0)) {
> >                 inbuffersize = 2048;
> >             }
> >             inputStream = new 
> > BufferedInputStream(socket.getInputStream(),
> > inbuffersize);
> >             outputStream = new
> > BufferedOutputStream(socket.getOutputStream(), outbuffersize);
> > 
> > All this being said, even though the buffer I sent to read() is 64K I 
> > always seem to read 8192 bytes at a time.  Looking at this code, I 
> > would expect to be reading only 2048 bytes at a time.  I am still 
> > trying to figure this one out.
> > 
> 
> 
> Mark,
> 
> The way BufferedInputStream#read(byte[], int, int) is implemented content
> looks quite intelligent to me. If there is not content in the intermediate
> buffer BufferedInputStream will read directly from the underlying input
> stream. So, the size of the intermediate buffer should not matter a lot,
> when not reading content one byte at a time 
> 
> Oleg
> 
> 
> > Mark
> >  
> > -----Original Message-----
> > From: Oleg Kalnichevski [mailto:olegk@apache.org]
> > Sent: Thursday, November 09, 2006 6:11 AM
> > To: HttpClient User Discussion
> > Subject: Re: have question about buffer size
> > 
> > On Wed, 2006-11-08 at 22:27 +0200, Paranoid wrote:
> > > > Why should this be a problem? Over the network, maximum segments 
> > > > you get are (after removing framing/chunking overhead) probably 
> > > > that 1440 bytes in size, and httpclient is then returning you as 
> > > > much data as it can efficiently give at any given time.
> > > > You just keep on reading all the data, piece by piece.
> > > > That's how streams work; they present an abstraction over what may 
> > > > be (and often is) packet-based transport channel.
> > > 
> > > some peaces are about 55 KB size, and first read is 75 KB. so why 
> > > are all other about 1.4 KB?
> > 
> > There can be many factors affecting the content stream fragmentation 
> > and the way it is being transferred across the wire. I suggest that 
> > you install a traffic analyzer such as Ethereal and look at the 
> > packets sent by the target server. If you are absolutely convinced the 
> > content gets fragmented somewhere inside HttpClient, I'll dig into the 
> > code and try to pinpoint the problem.
> > 
> > Oleg
> > 
> > > and after - saving inputStream into FileOutputStream load CPU for 
> > > great number... this time it is really a problem... tested with 
> > > different buffer sized, and with greater buffer size have lower CPU
> load.
> > > P.S.: speed of downloading about 5 MB/s...
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: 
> > httpclient-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


RE: have question about buffer size

Posted by Mark Claassen <mc...@ocie.net>.
I just finished looking at that a bit, and I see that now that you do read a
byte at a time.  Knowing this, I agree with you.

I am now focusing on chunked input stream.  I don't really know how these
things work, but it seems that the chunk size is sent every time (?) and you
read the size and then read the data.  Looking at my chunk sizes, is see
that they are 8K or less.  This corresponds to the behavior I am seeing.

I must confess that I have no idea if any optimization would be possible
here.  I guess you could read more than one chunk at a time, but I don't
know if that would help anything.

Mark
 
-----Original Message-----
From: Oleg Kalnichevski [mailto:olegk@apache.org] 
Sent: Thursday, November 09, 2006 11:22 AM
To: HttpClient User Discussion
Subject: RE: have question about buffer size

On Thu, 2006-11-09 at 11:00 -0500, Mark Claassen wrote:
> I looked at the source a bit and it looks like the buffer size might 
> be throttled.
> 
> I see that my base stream is an AutoCloseInputStream, which is created 
> in HttpMethodBase.
> Its source is received from HttpConnection.getResposeInputStream().
> It looks like the input stream is actually created in open() where I 
> see this code: (Version 3.01)
> 
> (The sndBufSize and rcvBufSize is controlled through 
> HttpConnectionParams)
> 
>             int outbuffersize = socket.getSendBufferSize();
>             if ((outbuffersize > 2048) || (outbuffersize <= 0)) {
>                 outbuffersize = 2048;
>             }
>             int inbuffersize = socket.getReceiveBufferSize();
>             if ((inbuffersize > 2048) || (inbuffersize <= 0)) {
>                 inbuffersize = 2048;
>             }
>             inputStream = new 
> BufferedInputStream(socket.getInputStream(),
> inbuffersize);
>             outputStream = new
> BufferedOutputStream(socket.getOutputStream(), outbuffersize);
> 
> All this being said, even though the buffer I sent to read() is 64K I 
> always seem to read 8192 bytes at a time.  Looking at this code, I 
> would expect to be reading only 2048 bytes at a time.  I am still 
> trying to figure this one out.
> 


Mark,

The way BufferedInputStream#read(byte[], int, int) is implemented content
looks quite intelligent to me. If there is not content in the intermediate
buffer BufferedInputStream will read directly from the underlying input
stream. So, the size of the intermediate buffer should not matter a lot,
when not reading content one byte at a time 

Oleg


> Mark
>  
> -----Original Message-----
> From: Oleg Kalnichevski [mailto:olegk@apache.org]
> Sent: Thursday, November 09, 2006 6:11 AM
> To: HttpClient User Discussion
> Subject: Re: have question about buffer size
> 
> On Wed, 2006-11-08 at 22:27 +0200, Paranoid wrote:
> > > Why should this be a problem? Over the network, maximum segments 
> > > you get are (after removing framing/chunking overhead) probably 
> > > that 1440 bytes in size, and httpclient is then returning you as 
> > > much data as it can efficiently give at any given time.
> > > You just keep on reading all the data, piece by piece.
> > > That's how streams work; they present an abstraction over what may 
> > > be (and often is) packet-based transport channel.
> > 
> > some peaces are about 55 KB size, and first read is 75 KB. so why 
> > are all other about 1.4 KB?
> 
> There can be many factors affecting the content stream fragmentation 
> and the way it is being transferred across the wire. I suggest that 
> you install a traffic analyzer such as Ethereal and look at the 
> packets sent by the target server. If you are absolutely convinced the 
> content gets fragmented somewhere inside HttpClient, I'll dig into the 
> code and try to pinpoint the problem.
> 
> Oleg
> 
> > and after - saving inputStream into FileOutputStream load CPU for 
> > great number... this time it is really a problem... tested with 
> > different buffer sized, and with greater buffer size have lower CPU
load.
> > P.S.: speed of downloading about 5 MB/s...
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: 
> httpclient-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


RE: have question about buffer size

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Thu, 2006-11-09 at 11:00 -0500, Mark Claassen wrote:
> I looked at the source a bit and it looks like the buffer size might be
> throttled.
> 
> I see that my base stream is an AutoCloseInputStream, which is created in
> HttpMethodBase.
> Its source is received from HttpConnection.getResposeInputStream().
> It looks like the input stream is actually created in open() where I see
> this code: (Version 3.01)
> 
> (The sndBufSize and rcvBufSize is controlled through HttpConnectionParams)
> 
>             int outbuffersize = socket.getSendBufferSize();
>             if ((outbuffersize > 2048) || (outbuffersize <= 0)) {
>                 outbuffersize = 2048;
>             }
>             int inbuffersize = socket.getReceiveBufferSize();
>             if ((inbuffersize > 2048) || (inbuffersize <= 0)) {
>                 inbuffersize = 2048;
>             }
>             inputStream = new BufferedInputStream(socket.getInputStream(),
> inbuffersize);
>             outputStream = new
> BufferedOutputStream(socket.getOutputStream(), outbuffersize);
> 
> All this being said, even though the buffer I sent to read() is 64K I always
> seem to read 8192 bytes at a time.  Looking at this code, I would expect to
> be reading only 2048 bytes at a time.  I am still trying to figure this one
> out.
> 


Mark,

The way BufferedInputStream#read(byte[], int, int) is implemented
content looks quite intelligent to me. If there is not content in the
intermediate buffer BufferedInputStream will read directly from the
underlying input stream. So, the size of the intermediate buffer should
not matter a lot, when not reading content one byte at a time 

Oleg


> Mark
>  
> -----Original Message-----
> From: Oleg Kalnichevski [mailto:olegk@apache.org] 
> Sent: Thursday, November 09, 2006 6:11 AM
> To: HttpClient User Discussion
> Subject: Re: have question about buffer size
> 
> On Wed, 2006-11-08 at 22:27 +0200, Paranoid wrote:
> > > Why should this be a problem? Over the network, maximum segments you 
> > > get are (after removing framing/chunking overhead) probably that 
> > > 1440 bytes in size, and httpclient is then returning you as much 
> > > data as it can efficiently give at any given time.
> > > You just keep on reading all the data, piece by piece.
> > > That's how streams work; they present an abstraction over what may 
> > > be (and often is) packet-based transport channel.
> > 
> > some peaces are about 55 KB size, and first read is 75 KB. so why are 
> > all other about 1.4 KB?
> 
> There can be many factors affecting the content stream fragmentation and the
> way it is being transferred across the wire. I suggest that you install a
> traffic analyzer such as Ethereal and look at the packets sent by the target
> server. If you are absolutely convinced the content gets fragmented
> somewhere inside HttpClient, I'll dig into the code and try to pinpoint the
> problem.
> 
> Oleg
> 
> > and after - saving inputStream into FileOutputStream load CPU for 
> > great number... this time it is really a problem... tested with 
> > different buffer sized, and with greater buffer size have lower CPU load.
> > P.S.: speed of downloading about 5 MB/s...
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


RE: have question about buffer size

Posted by Mark Claassen <mc...@ocie.net>.
I looked at the source a bit and it looks like the buffer size might be
throttled.

I see that my base stream is an AutoCloseInputStream, which is created in
HttpMethodBase.
Its source is received from HttpConnection.getResposeInputStream().
It looks like the input stream is actually created in open() where I see
this code: (Version 3.01)

(The sndBufSize and rcvBufSize is controlled through HttpConnectionParams)

            int outbuffersize = socket.getSendBufferSize();
            if ((outbuffersize > 2048) || (outbuffersize <= 0)) {
                outbuffersize = 2048;
            }
            int inbuffersize = socket.getReceiveBufferSize();
            if ((inbuffersize > 2048) || (inbuffersize <= 0)) {
                inbuffersize = 2048;
            }
            inputStream = new BufferedInputStream(socket.getInputStream(),
inbuffersize);
            outputStream = new
BufferedOutputStream(socket.getOutputStream(), outbuffersize);

All this being said, even though the buffer I sent to read() is 64K I always
seem to read 8192 bytes at a time.  Looking at this code, I would expect to
be reading only 2048 bytes at a time.  I am still trying to figure this one
out.

Mark
 
-----Original Message-----
From: Oleg Kalnichevski [mailto:olegk@apache.org] 
Sent: Thursday, November 09, 2006 6:11 AM
To: HttpClient User Discussion
Subject: Re: have question about buffer size

On Wed, 2006-11-08 at 22:27 +0200, Paranoid wrote:
> > Why should this be a problem? Over the network, maximum segments you 
> > get are (after removing framing/chunking overhead) probably that 
> > 1440 bytes in size, and httpclient is then returning you as much 
> > data as it can efficiently give at any given time.
> > You just keep on reading all the data, piece by piece.
> > That's how streams work; they present an abstraction over what may 
> > be (and often is) packet-based transport channel.
> 
> some peaces are about 55 KB size, and first read is 75 KB. so why are 
> all other about 1.4 KB?

There can be many factors affecting the content stream fragmentation and the
way it is being transferred across the wire. I suggest that you install a
traffic analyzer such as Ethereal and look at the packets sent by the target
server. If you are absolutely convinced the content gets fragmented
somewhere inside HttpClient, I'll dig into the code and try to pinpoint the
problem.

Oleg

> and after - saving inputStream into FileOutputStream load CPU for 
> great number... this time it is really a problem... tested with 
> different buffer sized, and with greater buffer size have lower CPU load.
> P.S.: speed of downloading about 5 MB/s...


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org

Re: have question about buffer size

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Wed, 2006-11-08 at 22:27 +0200, Paranoid wrote:
> > Why should this be a problem? Over the network,
> > maximum segments you get are (after removing
> > framing/chunking overhead) probably that 1440 bytes in
> > size, and httpclient is then returning you as much
> > data as it can efficiently give at any given time.
> > You just keep on reading all the data, piece by piece.
> > That's how streams work; they present an abstraction
> > over what may be (and often is) packet-based transport
> > channel.
> 
> some peaces are about 55 KB size, and first read is 75 KB. so why are all 
> other about 1.4 KB?

There can be many factors affecting the content stream fragmentation and
the way it is being transferred across the wire. I suggest that you
install a traffic analyzer such as Ethereal and look at the packets sent
by the target server. If you are absolutely convinced the content gets
fragmented somewhere inside HttpClient, I'll dig into the code and try
to pinpoint the problem.

Oleg

> and after - saving inputStream into FileOutputStream load CPU for great 
> number... this time it is really a problem... tested with different buffer 
> sized, and with greater buffer size have lower CPU load.
> P.S.: speed of downloading about 5 MB/s...


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: have question about buffer size

Posted by Paranoid <pa...@ukr.net>.
> Why should this be a problem? Over the network,
> maximum segments you get are (after removing
> framing/chunking overhead) probably that 1440 bytes in
> size, and httpclient is then returning you as much
> data as it can efficiently give at any given time.
> You just keep on reading all the data, piece by piece.
> That's how streams work; they present an abstraction
> over what may be (and often is) packet-based transport
> channel.

some peaces are about 55 KB size, and first read is 75 KB. so why are all 
other about 1.4 KB?
and after - saving inputStream into FileOutputStream load CPU for great 
number... this time it is really a problem... tested with different buffer 
sized, and with greater buffer size have lower CPU load.
P.S.: speed of downloading about 5 MB/s...
-- 
best regards,
        Paranoid

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


RE: have question about buffer size

Posted by Mark Claassen <mc...@ocie.net>.
It is my understanding that the network card receives information
asynchronously from the reading stream and can buffer input on its own.  Is
this no true?  If it does buffer, then even though the packets are small it
seems like there might be more to read than just one packet on successive
read().  Is this not true?

Mark
 
-----Original Message-----
From: Tatu Saloranta [mailto:cowtowncoder@yahoo.com] 
Sent: Wednesday, November 08, 2006 11:18 AM
To: HttpClient User Discussion; Paranoid
Subject: Re: have question about buffer size

--- Paranoid <pa...@ukr.net> wrote:

> have src file in attachment.
> description: creating buffer with 20 MB size.
> reading and appending readed size into StringBuilder. after reading - 
> show StringBuilder contents. on my machine with Mustang b101 we read 
> about 1440 bytes every time, instead of read 20 MB. have great 
> perfomance problem and need REAL buffer, but dont know what to do...

Why should this be a problem? Over the network, maximum segments you get are
(after removing framing/chunking overhead) probably that 1440 bytes in size,
and httpclient is then returning you as much data as it can efficiently give
at any given time.
You just keep on reading all the data, piece by piece.
That's how streams work; they present an abstraction over what may be (and
often is) packet-based transport channel.

-+ Tatu +-




 
____________________________________________________________________________
________
Sponsored Link

Mortgage rates near 39yr lows. $420k for $1,399/mo. 
Calculate new payment!
http://www.LowerMyBills.com/lre

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: have question about buffer size

Posted by Tatu Saloranta <co...@yahoo.com>.
--- Paranoid <pa...@ukr.net> wrote:

> have src file in attachment.
> description: creating buffer with 20 MB size.
> reading and appending readed size into
> StringBuilder. after reading - show StringBuilder
> contents. on my machine with Mustang b101 we read
> about 1440 
> bytes every time, instead of read 20 MB. have great
> perfomance problem and need REAL buffer, but dont
> know what to do...

Why should this be a problem? Over the network,
maximum segments you get are (after removing
framing/chunking overhead) probably that 1440 bytes in
size, and httpclient is then returning you as much
data as it can efficiently give at any given time.
You just keep on reading all the data, piece by piece.
That's how streams work; they present an abstraction
over what may be (and often is) packet-based transport
channel.

-+ Tatu +-




 
____________________________________________________________________________________
Sponsored Link

Mortgage rates near 39yr lows. $420k for $1,399/mo. 
Calculate new payment!
http://www.LowerMyBills.com/lre

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org