You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Robert Pluim <rp...@bigfoot.com> on 2003/02/23 16:58:17 UTC

why is svn add slow?

Because it rereads _and_ rewrites the entries file for every file
added, and locks and unlocks the directory it's working on for every
file, even when all the scheduled adds are in the same directory.
I've done a few tests, and changing to locking, reading and writing
once per dir speeds up svn add by about 30x.  The question I have is,
what is the best way to improve the current code:

1) Change svn_client_add to remember the directory it was last working
   on, making sure it's cached the entries file from the previous time
   around? (where would it cache it?  The pool it's passed might not
   exist next time round).

2) Add an svn_client_add_in_dir, where you pass an apr_array where all
   the targets are guaranteed to be in the same dir, and make
   svn_cl__add call that?

3) Some other way to make the entries file be cached?  I haven't fully
   understood how the set field of svn_wc_adm_access_t is used yet.

I'm not sure either way. (1) feels icky, since it requires caching
unbeknownst to the client code. (2) feels cleaner, but requires
svn_cl__add to sort through all the scheduled adds, splitting them
based on parent directory.

What do you think?

Robert
-- 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: why is svn add slow?

Posted by Robert Pluim <rp...@bigfoot.com>.
Philip Martin writes:
 > Robert Pluim <rp...@bigfoot.com> writes:

 > It's low priority (for me personally) because adding files is not a
 > bottleneck when using version control, even when if the process is
 > slow.  It just doesn't happen that often.

I know it doesn't happen often, but it just seems extremely wasteful
to me to reparse the entries file when it's not necessary (and svn add
is not the only culprit).

 > There have been requests in
 > the past to change the 'svn add' behavior so that it doesn't stop on
 > an already versioned item but continues to consider any children
 > (assuming --non-recursive was not given and repecting the ignored
 > names).  As far as I recall there were no objections to this, just
 > that nobody has so far implemented it.  Would that solve your
 > particular use case?
 >

I don't think that would help me.

 > > 1) Change svn_client_add to remember the directory it was last working
 > >    on, making sure it's cached the entries file from the previous time
 > >    around? (where would it cache it?  The pool it's passed might not
 > >    exist next time round).
 > 
 > I don't really like this approach, it makes the client interface more
 > difficult to use if application has to get involved with the access
 > batons.
 >

Exactly, which is why I didn't like it.

 > > 2) Add an svn_client_add_in_dir, where you pass an apr_array where all
 > >    the targets are guaranteed to be in the same dir, and make
 > >    svn_cl__add call that?
 > 
 > I don't like the idea of a second add interface either.
 >

OK, we'll strike that one as well then ;-)

 > > 3) Some other way to make the entries file be cached?  I haven't fully
 > >    understood how the set field of svn_wc_adm_access_t is used yet.
 > 
 > There is some documentation in notes/entries-caching, it went there
 > when entries caching was planned but not implemented, possibly some of
 > it should move into lock.c.
 >

I'll have a read of that and see if any bells start ringing.  Thanks
for the pointer.

 > I would prefer a single add interface such as
 > 
 > svn_error_t *
 > svn_client_add (const apr_array_header_t *paths,
 >                 svn_boolean_t recursive,
 >                 svn_client_ctx_t *ctx,
 >                 apr_pool_t *pool);
 > 
 > and have the client library reuse the access batons.

Hmm, so we pass the responsibility for iterating over the paths down
to svn_client_add?  It could stick the access batons in a hash keyed
off the directory, that looks like it would work.

Let me go off and play with it for a bit, and see what happens.

Robert
-- 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: why is svn add slow?

Posted by Philip Martin <ph...@codematters.co.uk>.
Robert Pluim <rp...@bigfoot.com> writes:

> Because it rereads _and_ rewrites the entries file for every file
> added, and locks and unlocks the directory it's working on for every
> file, even when all the scheduled adds are in the same directory.
> I've done a few tests, and changing to locking, reading and writing
> once per dir speeds up svn add by about 30x.

It's low priority (for me personally) because adding files is not a
bottleneck when using version control, even when if the process is
slow.  It just doesn't happen that often.  There have been requests in
the past to change the 'svn add' behavior so that it doesn't stop on
an already versioned item but continues to consider any children
(assuming --non-recursive was not given and repecting the ignored
names).  As far as I recall there were no objections to this, just
that nobody has so far implemented it.  Would that solve your
particular use case?

> The question I have is, what is the best way to improve the current
> code:
>
> 1) Change svn_client_add to remember the directory it was last working
>    on, making sure it's cached the entries file from the previous time
>    around? (where would it cache it?  The pool it's passed might not
>    exist next time round).

I don't really like this approach, it makes the client interface more
difficult to use if application has to get involved with the access
batons.

> 2) Add an svn_client_add_in_dir, where you pass an apr_array where all
>    the targets are guaranteed to be in the same dir, and make
>    svn_cl__add call that?

I don't like the idea of a second add interface either.

> 3) Some other way to make the entries file be cached?  I haven't fully
>    understood how the set field of svn_wc_adm_access_t is used yet.

There is some documentation in notes/entries-caching, it went there
when entries caching was planned but not implemented, possibly some of
it should move into lock.c.

> I'm not sure either way. (1) feels icky, since it requires caching
> unbeknownst to the client code. (2) feels cleaner, but requires
> svn_cl__add to sort through all the scheduled adds, splitting them
> based on parent directory.
> 
> What do you think?

I would prefer a single add interface such as

svn_error_t *
svn_client_add (const apr_array_header_t *paths,
                svn_boolean_t recursive,
                svn_client_ctx_t *ctx,
                apr_pool_t *pool);

and have the client library reuse the access batons.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org