You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by st...@studio.imagemagick.org on 2005/11/13 03:23:25 UTC

Multiple file copies is prohibitively slow

When making a large number of individual file copies using the copy_path
of add_file(), the process becomes prohibitively slow.  For example, we
have a repository of 16384 documents (single revision).  To make 32768
copies takes over 1 hour on a Fedora Intel 3ghz box.  Is there a method
for speeding up this process?

Background:  We're creating a document management system.  We use the
SVN API to copy the documents as follows (revsion 1):

  repository/A
  repository/B
  repository/C
  ...

However we require multiple views of the repository so we create copies like
this (revision 2):

  repository/view-by-filename/holiday.txt (points to repository/A)
  repository/view-by-filename/king.doc (points to repository/B)
  repository/view-by-filename/movie.ppt (points to repository/C)
  repository/view-by-hash/A (points to repository/A)
  repository/view-by-hash/B (points to repository/B)
  repository/view-by-hash/C (points to repository/C)
  ...

Here is the relevant code.  Assume the repository/view-by-filename and
repository/view-by-hash directories were previously added:

  for (i=0; i < 16384; i++
  {
    svn_status=repository_info->editor->add_file(filename[i],repository->root,
      url[i],repository->revision+1,pool,&file_info);
    svn_status=snarf_info->editor->close_file(file_info,NULL,pool);
    svn_status=repository_info->editor->add_file(hash[i],repository->root,
      url[i],repository->revision+1,pool,&file_info);
    svn_status=snarf_info->editor->close_file(file_info,NULL,pool);
  }

We tried adding the copy while the repository is created but we get an
error because the path is not available until the document repository
transaction is closed.

We tried closing the transaction after each document is added but the
copy history started getting larger and larger.  After several thousand
documents the overhead of the copy history became larger than the documents
themselves and added new documents became slower and slower.

We considered using svn_fs_copy()/svn_fs_revision_link() but we're using
svn_ra_open() and did not have the appropriate structures available and
we are not sure it will make a difference in speed.

As an aside, we noticed if you add a directory that already exists, an
error is returned as expected but there is a memory leak.  Our process
grew to 2.5GB.  We changed the code to first open the directory and if it
does not exist, add it.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org