You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Julian Foad <ju...@gmail.com> on 2015/09/05 12:01:51 UTC

svnsync crash on empty repo

Hi, Bert.

On trunk (released versions are unaffected), "svnsync sync" with an
empty source repo fails an assertion in svn_ra_replay_range():

subversion/libsvn_ra/ra_loader.c' line 1198: assertion failed
(SVN_IS_VALID_REVNUM(start_revision) &&
SVN_IS_VALID_REVNUM(end_revision) && start_revision <= end_revision &&
SVN_IS_VALID_REVNUM(low_water_mark))

because start_revision is 1 and end_revision is 0.

The assertion was added in r1665480
<http://svn.apache.org/viewvc?revision=1665480&view=revision>, with
the log msg mentioning "Add assertions here that already apply to some
ra layers."

One possible fix is to avoid calling it with an empty revision range, like this:

Index: subversion/svnsync/svnsync.c
===================================================================
--- subversion/svnsync/svnsync.c    (revision 1701278)
+++ subversion/svnsync/svnsync.c    (working copy)
@@ -1551,3 +1551,3 @@ do_synchronize(svn_ra_session_t *to_sess

-  if (from_latest < last_merged)
+  if (from_latest <= last_merged)
     return SVN_NO_ERROR;

That makes the caller return early when there are no revisions to
sync. The only other real caller (svnrdump) already uses a similar
condition.

However, I think we need to continue allowing the historical usage of
svn_ra_replay_range() for backward compatibility. Should the assertion
should be changed like this?

-                 && start_revision <= end_revision
+                 && start_revision <= (end_revision + 1)

I haven't yet seen where the problem or restriction existed in some RA
layers, so I don't know if that's a good enough fix on its own. My new
svnsync test (empty repo) passes over all RA layers with this change.

The attached patch contains both possible fixes, a log msg and a
regression test.

(I found this by running tests with the "--dump-load-cross-check"
option, which I have not done for a while. Doing so also reports
several other test failures, mostly just because those tests use
deliberately invalid data.)

- Julian

Re: svnsync crash on empty repo

Posted by Julian Foad <ju...@gmail.com>.

I (Julian Foad) wrote:
> On trunk (released versions are unaffected), "svnsync sync" with an
> empty source repo fails an assertion in svn_ra_replay_range():

In fact this problem is not specific to an empty repository, but to
any repository that is already fully synchronized. An empty repository
just happened to be the first case I came across.

- Julian


> subversion/libsvn_ra/ra_loader.c' line 1198: assertion failed
> (SVN_IS_VALID_REVNUM(start_revision) &&
> SVN_IS_VALID_REVNUM(end_revision) && start_revision <= end_revision &&
> SVN_IS_VALID_REVNUM(low_water_mark))
>
> because start_revision is 1 and end_revision is 0.
>
> The assertion was added in r1665480
> <http://svn.apache.org/viewvc?revision=1665480&view=revision>, with
> the log msg mentioning "Add assertions here that already apply to some
> ra layers."
>
> One possible fix is to avoid calling it with an empty revision range, like this:
>
> Index: subversion/svnsync/svnsync.c
> ===================================================================
> --- subversion/svnsync/svnsync.c    (revision 1701278)
> +++ subversion/svnsync/svnsync.c    (working copy)
> @@ -1551,3 +1551,3 @@ do_synchronize(svn_ra_session_t *to_sess
>
> -  if (from_latest < last_merged)
> +  if (from_latest <= last_merged)
>      return SVN_NO_ERROR;
>
> That makes the caller return early when there are no revisions to
> sync. The only other real caller (svnrdump) already uses a similar
> condition.
>
> However, I think we need to continue allowing the historical usage of
> svn_ra_replay_range() for backward compatibility. Should the assertion
> should be changed like this?
>
> -                 && start_revision <= end_revision
> +                 && start_revision <= (end_revision + 1)
>
> I haven't yet seen where the problem or restriction existed in some RA
> layers, so I don't know if that's a good enough fix on its own. My new
> svnsync test (empty repo) passes over all RA layers with this change.
>
> The attached patch contains both possible fixes, a log msg and a
> regression test.
>
> (I found this by running tests with the "--dump-load-cross-check"
> option, which I have not done for a while. Doing so also reports
> several other test failures, mostly just because those tests use
> deliberately invalid data.)
>
> - Julian

RE: svnsync crash on empty repo

Posted by Bert Huijben <be...@qqmail.nl>.

I'll write a larger reply tomorrow (after looking through the code a bit more), but I think this is about how I feel about this. Great summary!

Bert

-----Original Message-----
From: "Julian Foad" <ju...@gmail.com>
Sent: ‎6-‎9-‎2015 15:15
To: "Bert Huijben" <be...@qqmail.nl>
Cc: "dev" <de...@subversion.apache.org>
Subject: Re: svnsync crash on empty repo

I have had an idea about what may be bothering you. Is it that you
don't like us to have any interface where a valid revnum value could
be greater than HEAD, because it's counter-intuitive and we rarely do
it and in most cases a valid revnum cannot be greater than head?

If that's the case, I have sympathy for that reasoning. For new APIs,
I would certainly agree.

For replay_range we could:
  * require a non-empty range, breaking strict backward compatibility;
  * make it accept an empty range and return without doing anything
(choosing to implement the test either as (start_rev > end_rev) or as
(start_rev == end_rev + 1));
  * rev the API, and make the old version accept an empty range, and
the new version reject an empty range;
  * rev the API, and make the old version accept an empty range as
before, and make the new version take an "exclusive" instead of an
"inclusive" start revision, and accept an empty range (which would be
represented with start_rev == end_rev and neither of these would ever
be greater than HEAD).

To make the (existing) API take an empty range while ensuring that a
greater-than-head revnum will never be passed through to the RA layer
implementations, the check and the early return could be implemented
in svn_ra_replay_range() directly.

We can of course decide to break API compatibility and update the
caller, but we should not do that without considering whether we have
better options.

- Julian

On 6 September 2015 at 09:48, Julian Foad <ju...@gmail.com> wrote:
> Bert Huijben wrote:
>> Julian Foad wrote:
>>> Bert wrote:
>>>> I don't think we should fix this with a 'revision+1' to explicitly allow
>>>> many bad ranges,
>>>
>>> Why?
>>>
>>> An empty range isn't inherently "bad". Allowing an empty range isn't
>>> bad. Being able to specify an empty range in many different ways isn't
>>> bad. Changing a released API to make a previously no-op case now throw
>>> an assertion failure breaks our compatibility guarantees. That is bad.
>>
>> An empty range isn't bad, but your patch appears to also explicitly allow undefined ranges, if I read it correctly.
>>
>> I would call A range of r1 upto r0 defined (empty), but I would call a range or r4 upto r3 with current HEAD r3 undefined.
>>
>> We use r1 and r0 as special in quite a few places where we really just want to point at the initial revision,
>
> I am totally against the idea of treating r1 as a special value.
>
> I didn't think we used r1 as special anywhere. Can you point to any of
> those places, please?
>
> (There was a case added to the mergeinfo parsing a few years ago that
> treated r1 as special -- treating "change r1" as an invalid merge
> source -- and I removed that special case because it was illogical.
> That is the only case I recall seeing.)
>
>> I see no problem with that but I don't want to introduce a definition of 'HEAD+1' as there is no we can declare stable behavior on that... It may even change partway through an operation, the moment that revision is committed.
>
> I am not proposing that "HEAD+1" is a special value. I propose that
> "r4 up to r3" specifies an empty range because there are no integers R
> where (R >= 4 and R <= 3). It has nothing to do with the value of
> HEAD, so doesn't matter if HEAD changes.
>
> Logically a range of revisions (in the sense of *changes*) is defined
> by a starting state (aka snapshot) and an ending state. When we want
> the 'replay_range' API to replay the single change between (state) r2
> and (state) r3, the API is defined in such a way that we have to pass
> (starting_state + 1, ending_state), that is (r3, r3).
>
> Now consider if we request to replay an empty range. To replay the
> empty range between (state) r3 and (state) r3, we have to pass in
> parameters (r3+1, r3). That doesn't mean we're actually going to
> dereference the state (r3+1). If HEAD is r3, dereferencing r4 would be
> an error. But the API only takes this input (r4) in order to refer to
> a starting state r3, because that's how the API was defined. It
> doesn't dereference r4, so no problem. Like how a pointer to the end
> of a memory block in C is defined as valid even if the byte it points
> to is outside the allocated block, as long as we don't dereference it.
>
> The only place where the value of HEAD matters is when you want to
> check that the revision range we specified is no higher than HEAD.
> Think of it logically. If we specified an empty range using (R+1, R)
> then we have not specified a range that goes higher than HEAD even if
> (R+1) == (HEAD+1). Therefore, HEAD+1 is acceptable as the "start"
> parameter *iff* the "end" parameter is is less than "start".
>
>
>> (It would be nicer if we used SVN_INVALID_REVNUM in more cases as the magic MIN value, but at least ra_svn will assert if we would try to do that, as it doesn't allow transferring r-1 as a normal serialized revision. So fixing that requires a timemachine)
>>
>> Svnrdump had a bug where it actually passed HEAD+1 as base revision of a commit in a specific case... It is better to catch these errors on the client than to rely on not having concurrent commits to define behavior of our tools.
>
> I'm not suggesting to rely on not having concurrent commits!
>
> BTW what's your opinion on maintaining backward compatibility of this
> API? One option, if you want to disallow specifying an empty range, is
> to create a new API and keep the old one as it was.
>
> - Julian

Re: svnsync crash on empty repo

Posted by Julian Foad <ju...@gmail.com>.

I have had an idea about what may be bothering you. Is it that you
don't like us to have any interface where a valid revnum value could
be greater than HEAD, because it's counter-intuitive and we rarely do
it and in most cases a valid revnum cannot be greater than head?

If that's the case, I have sympathy for that reasoning. For new APIs,
I would certainly agree.

For replay_range we could:
  * require a non-empty range, breaking strict backward compatibility;
  * make it accept an empty range and return without doing anything
(choosing to implement the test either as (start_rev > end_rev) or as
(start_rev == end_rev + 1));
  * rev the API, and make the old version accept an empty range, and
the new version reject an empty range;
  * rev the API, and make the old version accept an empty range as
before, and make the new version take an "exclusive" instead of an
"inclusive" start revision, and accept an empty range (which would be
represented with start_rev == end_rev and neither of these would ever
be greater than HEAD).

To make the (existing) API take an empty range while ensuring that a
greater-than-head revnum will never be passed through to the RA layer
implementations, the check and the early return could be implemented
in svn_ra_replay_range() directly.

We can of course decide to break API compatibility and update the
caller, but we should not do that without considering whether we have
better options.

- Julian

On 6 September 2015 at 09:48, Julian Foad <ju...@gmail.com> wrote:
> Bert Huijben wrote:
>> Julian Foad wrote:
>>> Bert wrote:
>>>> I don't think we should fix this with a 'revision+1' to explicitly allow
>>>> many bad ranges,
>>>
>>> Why?
>>>
>>> An empty range isn't inherently "bad". Allowing an empty range isn't
>>> bad. Being able to specify an empty range in many different ways isn't
>>> bad. Changing a released API to make a previously no-op case now throw
>>> an assertion failure breaks our compatibility guarantees. That is bad.
>>
>> An empty range isn't bad, but your patch appears to also explicitly allow undefined ranges, if I read it correctly.
>>
>> I would call A range of r1 upto r0 defined (empty), but I would call a range or r4 upto r3 with current HEAD r3 undefined.
>>
>> We use r1 and r0 as special in quite a few places where we really just want to point at the initial revision,
>
> I am totally against the idea of treating r1 as a special value.
>
> I didn't think we used r1 as special anywhere. Can you point to any of
> those places, please?
>
> (There was a case added to the mergeinfo parsing a few years ago that
> treated r1 as special -- treating "change r1" as an invalid merge
> source -- and I removed that special case because it was illogical.
> That is the only case I recall seeing.)
>
>> I see no problem with that but I don't want to introduce a definition of 'HEAD+1' as there is no we can declare stable behavior on that... It may even change partway through an operation, the moment that revision is committed.
>
> I am not proposing that "HEAD+1" is a special value. I propose that
> "r4 up to r3" specifies an empty range because there are no integers R
> where (R >= 4 and R <= 3). It has nothing to do with the value of
> HEAD, so doesn't matter if HEAD changes.
>
> Logically a range of revisions (in the sense of *changes*) is defined
> by a starting state (aka snapshot) and an ending state. When we want
> the 'replay_range' API to replay the single change between (state) r2
> and (state) r3, the API is defined in such a way that we have to pass
> (starting_state + 1, ending_state), that is (r3, r3).
>
> Now consider if we request to replay an empty range. To replay the
> empty range between (state) r3 and (state) r3, we have to pass in
> parameters (r3+1, r3). That doesn't mean we're actually going to
> dereference the state (r3+1). If HEAD is r3, dereferencing r4 would be
> an error. But the API only takes this input (r4) in order to refer to
> a starting state r3, because that's how the API was defined. It
> doesn't dereference r4, so no problem. Like how a pointer to the end
> of a memory block in C is defined as valid even if the byte it points
> to is outside the allocated block, as long as we don't dereference it.
>
> The only place where the value of HEAD matters is when you want to
> check that the revision range we specified is no higher than HEAD.
> Think of it logically. If we specified an empty range using (R+1, R)
> then we have not specified a range that goes higher than HEAD even if
> (R+1) == (HEAD+1). Therefore, HEAD+1 is acceptable as the "start"
> parameter *iff* the "end" parameter is is less than "start".
>
>
>> (It would be nicer if we used SVN_INVALID_REVNUM in more cases as the magic MIN value, but at least ra_svn will assert if we would try to do that, as it doesn't allow transferring r-1 as a normal serialized revision. So fixing that requires a timemachine)
>>
>> Svnrdump had a bug where it actually passed HEAD+1 as base revision of a commit in a specific case... It is better to catch these errors on the client than to rely on not having concurrent commits to define behavior of our tools.
>
> I'm not suggesting to rely on not having concurrent commits!
>
> BTW what's your opinion on maintaining backward compatibility of this
> API? One option, if you want to disallow specifying an empty range, is
> to create a new API and keep the old one as it was.
>
> - Julian

Re: svnsync crash on empty repo

Posted by Julian Foad <ju...@gmail.com>.

Bert Huijben wrote:
> Julian Foad wrote:
>> Bert wrote:
>>> I don't think we should fix this with a 'revision+1' to explicitly allow
>>> many bad ranges,
>>
>> Why?
>>
>> An empty range isn't inherently "bad". Allowing an empty range isn't
>> bad. Being able to specify an empty range in many different ways isn't
>> bad. Changing a released API to make a previously no-op case now throw
>> an assertion failure breaks our compatibility guarantees. That is bad.
>
> An empty range isn't bad, but your patch appears to also explicitly allow undefined ranges, if I read it correctly.
>
> I would call A range of r1 upto r0 defined (empty), but I would call a range or r4 upto r3 with current HEAD r3 undefined.
>
> We use r1 and r0 as special in quite a few places where we really just want to point at the initial revision,

I am totally against the idea of treating r1 as a special value.

I didn't think we used r1 as special anywhere. Can you point to any of
those places, please?

(There was a case added to the mergeinfo parsing a few years ago that
treated r1 as special -- treating "change r1" as an invalid merge
source -- and I removed that special case because it was illogical.
That is the only case I recall seeing.)

> I see no problem with that but I don't want to introduce a definition of 'HEAD+1' as there is no we can declare stable behavior on that... It may even change partway through an operation, the moment that revision is committed.

I am not proposing that "HEAD+1" is a special value. I propose that
"r4 up to r3" specifies an empty range because there are no integers R
where (R >= 4 and R <= 3). It has nothing to do with the value of
HEAD, so doesn't matter if HEAD changes.

Logically a range of revisions (in the sense of *changes*) is defined
by a starting state (aka snapshot) and an ending state. When we want
the 'replay_range' API to replay the single change between (state) r2
and (state) r3, the API is defined in such a way that we have to pass
(starting_state + 1, ending_state), that is (r3, r3).

Now consider if we request to replay an empty range. To replay the
empty range between (state) r3 and (state) r3, we have to pass in
parameters (r3+1, r3). That doesn't mean we're actually going to
dereference the state (r3+1). If HEAD is r3, dereferencing r4 would be
an error. But the API only takes this input (r4) in order to refer to
a starting state r3, because that's how the API was defined. It
doesn't dereference r4, so no problem. Like how a pointer to the end
of a memory block in C is defined as valid even if the byte it points
to is outside the allocated block, as long as we don't dereference it.

The only place where the value of HEAD matters is when you want to
check that the revision range we specified is no higher than HEAD.
Think of it logically. If we specified an empty range using (R+1, R)
then we have not specified a range that goes higher than HEAD even if
(R+1) == (HEAD+1). Therefore, HEAD+1 is acceptable as the "start"
parameter *iff* the "end" parameter is is less than "start".

> (It would be nicer if we used SVN_INVALID_REVNUM in more cases as the magic MIN value, but at least ra_svn will assert if we would try to do that, as it doesn't allow transferring r-1 as a normal serialized revision. So fixing that requires a timemachine)
>
> Svnrdump had a bug where it actually passed HEAD+1 as base revision of a commit in a specific case... It is better to catch these errors on the client than to rely on not having concurrent commits to define behavior of our tools.

I'm not suggesting to rely on not having concurrent commits!

BTW what's your opinion on maintaining backward compatibility of this
API? One option, if you want to disallow specifying an empty range, is
to create a new API and keep the old one as it was.

- Julian

RE: svnsync crash on empty repo

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Julian Foad [mailto:julianfoad@gmail.com]
> Sent: zaterdag 5 september 2015 15:00
> To: Bert Huijben <be...@qqmail.nl>
> Cc: dev <de...@subversion.apache.org>
> Subject: Re: svnsync crash on empty repo
> 
> Bert wrote:
> > I don't think we should fix this with a 'revision+1' to explicitly allow
> > many bad ranges,
> 
> Why?
> 
> An empty range isn't inherently "bad". Allowing an empty range isn't
> bad. Being able to specify an empty range in many different ways isn't
> bad. Changing a released API to make a previously no-op case now throw
> an assertion failure breaks our compatibility guarantees. That is bad.

An empty range isn't bad, but your patch appears to also explicitly allow undefined ranges, if I read it correctly.

I would call A range of r1 upto r0 defined (empty), but I would call a range or r4 upto r3 with current HEAD r3 undefined.

We use r1 and r0 as special in quite a few places where we really just want to point at the initial revision, I see no problem with that but I don't want to introduce a definition of 'HEAD+1' as there is no we can declare stable behavior on that... It may even change partway through an operation, the moment that revision is committed.
(It would be nicer if we used SVN_INVALID_REVNUM in more cases as the magic MIN value, but at least ra_svn will assert if we would try to do that, as it doesn't allow transferring r-1 as a normal serialized revision. So fixing that requires a timemachine)

Svnrdump had a bug where it actually passed HEAD+1 as base revision of a commit in a specific case... It is better to catch these errors on the client than to rely on not having concurrent commits to define behavior of our tools.

	Bet

Re: svnsync crash on empty repo

Posted by Julian Foad <ju...@gmail.com>.

Bert wrote:
> I don't think we should fix this with a 'revision+1' to explicitly allow
> many bad ranges,

Why?

An empty range isn't inherently "bad". Allowing an empty range isn't
bad. Being able to specify an empty range in many different ways isn't
bad. Changing a released API to make a previously no-op case now throw
an assertion failure breaks our compatibility guarantees. That is bad.

I don't like having undocumented behaviours and having to decide
whether they're to be supported or just accidental, but that's what we
have. In this case I think the behaviour is consistent and logical,
and is demonstrably used by a client (our own 'svn'). In this case, we
should decide that it is supported (and we should document it from now
on).

I explained why I think we *should* fix this. Please will you explain
why you think we shouldn't.

> but I do think we should specifically fix the r0 case for
> empty repositories.

Note that, as per my follow-up email, it's not specific to r0, but
applies to any sync when the mirror is already up to date.

I think we should change the caller to avoid calling replay_range with
an empty range, but mainly for aesthetic reasons within the caller (it
already has an early return but it is off by one), not because there's
anything wrong with making such a call.

(I think we should also add a regression test that does call
replay_range with an empty range, to verify that it continues to
work.)

- Julian

RE: svnsync crash on empty repo

Posted by be...@qqmail.nl.

I don't think we should fix this with a 'revision+1' to explicitly allow many bad ranges, but I do think we should specifically fix the r0 case for empty repositories.

Bert

Sent from Mail for Windows 10



From: Julian Foad
Sent: zaterdag 5 september 2015 12:02
To: dev;Bert Huijben
Subject: svnsync crash on empty repo


Hi, Bert.

On trunk (released versions are unaffected), "svnsync sync" with an
empty source repo fails an assertion in svn_ra_replay_range():

subversion/libsvn_ra/ra_loader.c' line 1198: assertion failed
(SVN_IS_VALID_REVNUM(start_revision) &&
SVN_IS_VALID_REVNUM(end_revision) && start_revision <= end_revision &&
SVN_IS_VALID_REVNUM(low_water_mark))

because start_revision is 1 and end_revision is 0.

The assertion was added in r1665480
<http://svn.apache.org/viewvc?revision=1665480&view=revision>, with
the log msg mentioning "Add assertions here that already apply to some
ra layers."

One possible fix is to avoid calling it with an empty revision range, like this:

Index: subversion/svnsync/svnsync.c
===================================================================
--- subversion/svnsync/svnsync.c    (revision 1701278)
+++ subversion/svnsync/svnsync.c    (working copy)
@@ -1551,3 +1551,3 @@ do_synchronize(svn_ra_session_t *to_sess

-  if (from_latest < last_merged)
+  if (from_latest <= last_merged)
     return SVN_NO_ERROR;

That makes the caller return early when there are no revisions to
sync. The only other real caller (svnrdump) already uses a similar
condition.

However, I think we need to continue allowing the historical usage of
svn_ra_replay_range() for backward compatibility. Should the assertion
should be changed like this?

-                 && start_revision <= end_revision
+                 && start_revision <= (end_revision + 1)

I haven't yet seen where the problem or restriction existed in some RA
layers, so I don't know if that's a good enough fix on its own. My new
svnsync test (empty repo) passes over all RA layers with this change.

The attached patch contains both possible fixes, a log msg and a
regression test.

(I found this by running tests with the "--dump-load-cross-check"
option, which I have not done for a while. Doing so also reports
several other test failures, mostly just because those tests use
deliberately invalid data.)

- Julian