You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Larry McVoy <lm...@bitmover.com> on 2010/11/21 18:05:03 UTC

svn update select(2)

Hi hackers,

We've got a customer that needs to move some data out of SVN and we wrote a
tool for them to do so.

Unfortunately, that tool is slow because svn update seems to artificially
slow itself down, I've straced it and at the end there is a select() that
waits until one second has passed by.  I'm guessing there is some sort of
transaction log that doesn't want to be updated more often than that?

Before I waste more time (spent a half hour in the sources this morning
between dealing with kids), is there a flag to turn this off?  With it 
on it makes SVN appear to be 12x slower than BK and I know it's not that
slow.

Thanks,

--lm

Re: svn update select(2)

Posted by Larry McVoy <lm...@bitmover.com>.
On Mon, Nov 22, 2010 at 09:35:12AM -0500, C. Michael Pilato wrote:
> On 11/21/2010 07:59 PM, Greg Stein wrote:
> > Hey Larry!
> > 
> > Good to hear from you. Been quite a while :-P
> > 
> > Yes, the delay is there to deal with filesystem timestamp resolution
> > issues. I don't recall the specifics of *why*, but Bad Things can
> > happen if a filesystem doesn't have enough resolution.
> 
> It has to do with our timestamp-based mod detection.  On systems with
> insufficient granularity, it's easy (via operations done in quick
> succession) to get the WC-stored last-known-unmodified timestamp to match
> the timestamp of a quite-modified working file, which causes Subversion to
> not notice that the file is modified.

Huh.  So the problem is that svn uses timestamps for modification detection?
That's going to be fast but inaccurate even with your one second sleep.
Think NFS - the server sets the timestamp on creation, client sets it on
modification.  So an out of sync (time wise) server and an editor that 
unlinks/writes will hose you.

But good to understand the reasoning, that helps, and yes, with the var
things are speedy again.

Thanks for your help, pretty pleasant of you guys.

Cheers,
-- 
---
Larry McVoy                lm at bitmover.com           http://www.bitkeeper.com

Re: svn update select(2)

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 11/21/2010 07:59 PM, Greg Stein wrote:
> Hey Larry!
> 
> Good to hear from you. Been quite a while :-P
> 
> Yes, the delay is there to deal with filesystem timestamp resolution
> issues. I don't recall the specifics of *why*, but Bad Things can
> happen if a filesystem doesn't have enough resolution.

It has to do with our timestamp-based mod detection.  On systems with
insufficient granularity, it's easy (via operations done in quick
succession) to get the WC-stored last-known-unmodified timestamp to match
the timestamp of a quite-modified working file, which causes Subversion to
not notice that the file is modified.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: svn update select(2)

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 11/21/2010 08:17 PM, Larry McVoy wrote:
> We are trying to avoid parsing the dump format unless we have to.  Just 
> want to keep it simple.

Larry, Subversion provides C API support for parsing a dumpstream and
calling a user-supplied collection of callback functions with the harvested
data.  "Simple" might still be out of reach with this approach once you deal
with learning a new API and grabbing all the depencencies and messing with
APR pools and ....  But if you do find yourself interested, dig around in
svn_repos.h for this:

svn_error_t *
svn_repos_parse_dumpstream2(svn_stream_t *stream,
                            const svn_repos_parse_fns2_t *parse_fns,
                            void *parse_baton,
                            svn_cancel_func_t cancel_func,
                            void *cancel_baton,
                            apr_pool_t *pool);


-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: svn update select(2)

Posted by Larry McVoy <lm...@bitmover.com>.
Hey Greg,

Thanks a lot for the reply, very cool of you.

On the long time no talk, I crawled under a rock after getting beat bloody
by the open source guys, something about helping people and getting
shit on for it wasn't working for me :)  Slowly crawling back out.

Thanks for the variable, and the name rocks.  BitKeeper has one like
that too, can't remember, but it is something like

	I_KNOW_THIS_CORRUPTS_MY_TREE

We are trying to avoid parsing the dump format unless we have to.  Just 
want to keep it simple.

BTW, our importer is written in L which is a scripting language we did.
Open source though we haven't released it officially (if you want it
ask, we'll give you the source).  It's sort of C + perl.  Here's the 
code for the svn2bk stuff.  Feedback welcome.

Thanks again for the info - that helps.

--lm


/*
 * This is a little SVN to BK importer written in L.
 *
 * It does not attempt to handle
 *	renames (other than copy/delete pretty much like SVN)
 *	multiple branches
 *
 * Usage: 
 *	# Initial import:
 *	svn2bk [-hHOST] [-r<start>[..<stop]] <svn_url> <dir>
 *	# Incremental import in the top dir of the repo:
 *	svn2bk -i [-hHOST] [-r..<stop>]
 *
 * -hHOST	set the hostname used for checkins
 * -i		incremental, import whatever has been added
 * -rstart	start at this commit in the svn repo
 * -rstop	stop at this one (default HEAD)
 */

typedef struct {
	string	user;		// username who did the check in
	string	date;		// 2007-05-13 03:21:54 -0700
	string	cmts[];		// array of comments for the commit
} delta;

delta	log{int};		// cache of the entire log
int	revs[];			// ordered list of revs for this branch
string	q = "q";		// -v turns this off and makes bk noisy

void
main(int ac, string av[])
{
	string	c, host, url, dir;
	int	start, stop;	// if set, do this range.
	int	i, incremental;

	if (0) ac = 0;	// lint

	while (defined(c = getopt(av, "h:ir:v", undef))) {
		switch (c) {
		    case "h": host = optarg; break;
		    case "i": incremental = 1; break;
		    case "r":
			if (optarg =~ /(.+)\.\.(.+)/) {
				start = (int)$1;
				stop = (int)$2;
			} else if (optarg =~ /^\.\.(.+)/) {
				stop = (int)$1;
			} else {
				start = (int)optarg;
			}
			break;
		    case "v": q = ""; break;
		    default: usage(-1);
	    	}
	}
	url = av[optind++];
	dir = av[optind];
	ifndef (start) start = 1;
	ifdef (host) setenv("BK_HOST", host);
	setenv("CLOCK_DRIFT", "1");
	unless (defined(getenv("BK_HOST"))) usage(0);
	unless ((defined(url) && defined(dir)) || defined(incremental)) {
		usage(1);
	}
	ifdef (incremental) {
		unless (isdir(".bk") && isdir(".svn")) {
			fprintf(stderr, "Not at a BK/SVN root\n");
			exit(1);
		}
	} else unless (defined(setup(start, url, dir))) {
		usage(2);
	}
	ifndef (getlog(incremental, ++start, stop)) usage(3);
	for (i = 0; defined(revs[i]); i++) {
		assert(defined(cset(revs[i])));
	}
}

/*
 * Create an empty bk repo and the intial svn repo
 * We want to end up with .bk next to .svn
 */
int
setup(int start, string url, string dir)
{
	string	bk = "${dir}/bk";
	FILE	f;

	ifndef (system("svn co -q -r${start} ${url} ${dir}")) return (undef);

	/*
	 * Set up a repo inside the svn repo
	 */
	f = fopen("${dir}/.bk_config", "w");
	fprintf(f, "checkout:edit\n");
	fprintf(f, "clockskew:on\n");
	fprintf(f, "partial_check:on\n");
	fclose(f);
	ifndef (system("bk setup -f -c${dir}/.bk_config ${bk}")) return (undef);
	system("tar -C${bk} -cf- . | tar -C${dir} -xf-");
	unlink("${dir}/.bk_config");
	system("rm -rf ${bk}");
	if (isdir(bk)) die("rm failed\n");
	chdir(dir);
	if (isdir("bk")) die("rm failed2\n");
	setenv("BK_CONFIG", "clockskew=1!;compression:off!");
	mkdir("BitKeeper/tmp/dotbk");
	setenv("BK_DOTBK", "BitKeeper/tmp/dotbk");

	// Prune the top level one, we'll grep out the others
	system("bk ignore '.svn -prune'");

	system("bk -cxU | grep -v '/\.svn/ | bk ci -a${q}ly'SVN ${start}' -");
	system("bk _eula -a");
	system("bk commit -${q}y'SVN ${start}'");
	return (0);
}

/*
 * Import a SVN commit.
 * We get the updates, then 
 * - for each file that is not checked out, svn deleted it so we delete it
 * - for each modified/extra we check those in with the comment/user/date
 *   from the log message.
 */
int
cset(int rev)
{
	FILE	f;
	string	buf, tmp;

	fprintf(stderr, "=== SVN ${rev} ===\n");
	ifndef (system("svn update -q -r${rev}")) return (undef);
	tmp = "BitKeeper/tmp/comments";
	f = fopen(tmp, "w");
	foreach (buf in log{rev}.cmts) {
		fprintf(f, "%s\n", buf);
	}
	fclose(f);
	setenv("BK_USER", log{rev}.user);
	setenv("BK_DATE_TIME_ZONE", log{rev}.date);
	system("bk -U^G rm -f");
	system("bk -xcU | grep -v '/\.svn/' | bk ci -a${q}lY${tmp} -");
	f = fopen(tmp, "a");
	fprintf(f, "SVN: %d\n", rev);
	fclose(f);
	system("bk commit -${q}Y${tmp}");
	return (0);
}

/*
 * Load up the log, we'll use it for our commits.
 *	------------------------------------------------------------------------
 *	r59 | mcccol | 2007-04-17 18:23:39 -0700 (Tue, 17 Apr 2007) | 4 lines
 *	
 *	removed logging, started using Debug.error
 *	
 *	------------------------------------------------------------------------
 *	r60 | mcccol | 2007-04-17 18:25:08 -0700 (Tue, 17 Apr 2007) | 4 lines
 *	
 *	* Added fixbad to utf8 to repair damaged utf8
 *	* made regexps variables to preserver their regexp intrep
 *
 * etc.
 */
int
getlog(int incremental, int start, int stop)
{
	FILE	f;
	int	i, rev;
	string	cmts[];
	string	buf;

	ifdef (incremental) {
		start = (int)`svn info | grep Revision: | awk '{print $NF}'`;
		start++;
	}
	ifdef (stop) {
		if (stop <= start) {
			fprintf(stderr, "Already up to or past %d\n", stop);
			exit(1);
		}
		f = popen("svn log -r${start}:${stop} 2>@stderr", "r");
	} else {
		f = popen("svn log -r${start}:HEAD 2>@stderr", "r");
	}
	unless (defined(buf = <f>) && (buf =~ /^[-]+$/)) {
done:		fprintf(stderr, "Seems like you are up to date.\n");
		return (0);
	}

	while (!eof(f)) {
		unless (defined(buf = <f>)) {
			assert(eof(f));
			break;
		}
		unless (buf =~ /^r(\d+) \| ([^|]+) \| ([^(]+)/) {
			die("expected rev/date: ${buf}\n");
		}
		rev = (int)$1;
		push(&revs, rev);
		log{rev}.user = (string)$2;
		log{rev}.date = (string)$3;
		buf = <f>;	// toss the blank line
		undef(cmts);	// toss previous comments
		while (defined(buf = <f>)) {
			if ((length(buf) == 72) && (buf =~ /^[-]+$/)) break;
			push(&cmts, buf);
		}

		/*
		 * Lose trailing blank lines, they serve no purpose.
		 */
		for (i = length(cmts)-1; i >= 0; i--) {
			unless (cmts[i] =~ /^\s*$/) break;
			cmts[i] = undef;
		}
		log{rev}.cmts = cmts;
	}
	pclose(f);
	pop(&revs);	// we did this one in setup
	unless (length(revs)) goto done;
	return (0);
}

void
usage(int which)
{
	fprintf(stderr, "Barfed on %d.\n", which);
	exit(1);
}

Re: svn update select(2)

Posted by Greg Stein <gs...@gmail.com>.
Hey Larry!

Good to hear from you. Been quite a while :-P

Yes, the delay is there to deal with filesystem timestamp resolution
issues. I don't recall the specifics of *why*, but Bad Things can
happen if a filesystem doesn't have enough resolution.

You can disable the delay with:

$ export SVN_I_LOVE_CORRUPTED_WORKING_COPIES_SO_DISABLE_SLEEP_FOR_TIMESTAMPS=yes

before you run your update/export/whatever processes.

I would also like to point out that svn 1.7 will have a new tool named
"svnrdump" that produces an svn dumpfile from a *remote* repository.
If you guys have tools to process dump files, then svnrdump could be
very helpful for those who don't have server/admin access to produce a
dumpfile. It is pretty efficient tool, and was originally inspired as
a way to get a dumpfile for a fast-import into Git.

The 1.7 release is still a few months away, but if you're gung-ho,
then you could build it from trunk and deploy it for your data export
process.

Cheers,
-g

On Sun, Nov 21, 2010 at 13:05, Larry McVoy <lm...@bitmover.com> wrote:
> Hi hackers,
>
> We've got a customer that needs to move some data out of SVN and we wrote a
> tool for them to do so.
>
> Unfortunately, that tool is slow because svn update seems to artificially
> slow itself down, I've straced it and at the end there is a select() that
> waits until one second has passed by.  I'm guessing there is some sort of
> transaction log that doesn't want to be updated more often than that?
>
> Before I waste more time (spent a half hour in the sources this morning
> between dealing with kids), is there a flag to turn this off?  With it
> on it makes SVN appear to be 12x slower than BK and I know it's not that
> slow.
>
> Thanks,
>
> --lm
>