You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Greg Hudson <gh...@MIT.EDU> on 2001/02/05 06:35:25 UTC

Re: CVS update: subversion/subversion/client main.c

Greg Stein committed:
>   +  /* ### this function must be fixed to do an apr_stat() for SIZE,
>   +     ### alloc the buffer, then read the file into the buffer. Using
>   +     ### an svn_string_t means quadratic memory usage: start with
>   +     ### BUFSIZE, allocate 2*BUFSIZE, then alloc 4*BUFSIZE, etc. */

This doesn't yield quadratic memory usage; in the limit, this yields
memory usage of double the size of the file.  (Try counting from the
last allocation back to the first.)

We can still do better, of course; a factor of two is a big deal for a
potentially unbounded file.  (Not that I've checked what this file is
or how likely it is to get large.)

Re: CVS update: subversion/subversion/client main.c

Posted by Greg Stein <gs...@lyra.org>.

On Mon, Feb 05, 2001 at 02:06:20AM -0500, Greg Hudson wrote:
> > Ah... rereading your comment now that I've written the
> > formula. You're saying that formula converges on 2N, right? Ah. So
> > yes... O(N) memory. The copying is something like O(N log N)
> > though. Those first ten bytes get copied a bunch of times.
> 
> The time taken is also O(n); after all, no piece of memory is written
> to or copied from more than once, so the time taken can't be more than
> a constant factor times the amount of memory used.

Hmm... Boy, it's been too long since I did combinatorial analysis :-)

It doesn't feel right... seems that there should be a log(N) in there
somewhere, but I see your point. If we've shown the memory is O(N), and we
touch each of those memory locations twice (write, then read for a copy),
then we have O(2N) (== O(N)) for memory copying.

Just did the mail on that series progression. memory is O(2 - 1/(2^i)). Heh

> > Back to your original point: can you think of a goo term for
> > describing the behavior? [that I can slap into that comment?]
> 
> You're just trying to save a constant factor, so there's no special
> mathematical term.  I think "suboptimal" is what you're hunting
> for. :)

*laf* ... yup. Seeing that it is only a constant factor, then you bet.

> (Although if the file is small, maybe we shouldn't be caring.)

In this case, I think they are. But we should be wary of algorithms like
this if the file might be large. (or numerous)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: CVS update: subversion/subversion/client main.c

Posted by Greg Stein <gs...@lyra.org>.

On Mon, Feb 05, 2001 at 01:35:25AM -0500, Greg Hudson wrote:
> Greg Stein committed:
> >   +  /* ### this function must be fixed to do an apr_stat() for SIZE,
> >   +     ### alloc the buffer, then read the file into the buffer. Using
> >   +     ### an svn_string_t means quadratic memory usage: start with
> >   +     ### BUFSIZE, allocate 2*BUFSIZE, then alloc 4*BUFSIZE, etc. */
> 
> This doesn't yield quadratic memory usage; in the limit, this yields
> memory usage of double the size of the file.  (Try counting from the
> last allocation back to the first.)
> 
> We can still do better, of course; a factor of two is a big deal for a
> potentially unbounded file.  (Not that I've checked what this file is
> or how likely it is to get large.)

My comment wasn't clear, which I noticed after a second read. I added a new
sentence to the end: "The pools keep each of those allocs around."

Let's say that BUFSIZE is 10 bytes, and you read a 1000 byte file:

*) alloc BUFSIZE (10) bytes. copy the read data in (10 bytes).
*) double to 20 bytes. copy the read data in.
*) double to 40 bytes. copy in two chunks of data.
*) double to 80 bytes. copy in four chunks.
*) double to 160. copy 8 chunks
*) double to 320. copy 16 chunks
*) double to 640. copy 64 chunks
*) double to 1280. copy 36 chunks.

Total memory allocated: 1280+640+320+160+80+40+20+10 = 2250 bytes.

If there is a better name for that behavior than "quadratic", then I'd be
happy to be educated. I think that is the appropriate term here.

[ N + N/2 + N/4 + N/8 ... any mathematicians around? ]

Ah... rereading your comment now that I've written the formula. You're
saying that formula converges on 2N, right? Ah. So yes... O(N) memory. The
copying is something like O(N log N) though. Those first ten bytes get
copied a bunch of times.

Well, whatever. It can clearly be improved :-)

In this instance, the file won't be large. It is used for reading options
from a file :-)

Back to your original point: can you think of a goo term for describing the
behavior? [that I can slap into that comment?]

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/