You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Bert Huijben <be...@vmoo.com> on 2010/08/23 21:14:26 UTC

RE: svn commit: r988074 - in /subversion/trunk/subversion/tests/cmdline: svntest/wc.py upgrade_tests.py


> -----Original Message-----
> From: philip@apache.org [mailto:philip@apache.org]
> Sent: maandag 23 augustus 2010 13:23
> To: commits@subversion.apache.org
> Subject: svn commit: r988074 - in
> /subversion/trunk/subversion/tests/cmdline: svntest/wc.py
> upgrade_tests.py
> 
> Author: philip
> Date: Mon Aug 23 11:22:51 2010
> New Revision: 988074
> 
> URL: http://svn.apache.org/viewvc?rev=988074&view=rev
> Log:
> Verify pristines after upgrade so that the tests fail as expected
> in single-db.
> 
> * subversion/tests/cmdline/upgrade_tests.py
>   (check_pristine): New.
>   (basic_upgrade, upgrade_with_externals, upgrade_1_5_body): Check
> pristines.
> 
> * subversion/tests/cmdline/upgrade_tests.py
>   (text_base_path): Restore MD5 support removed in r960036.

I think the real fix would be to upgrade to SHA1 (and add the mapping in the pristines table) in the upgrade step. I expected that this was already handled?

	Bert 


Re: svn commit: r988074 - in /subversion/trunk/subversion/tests/cmdline: svntest/wc.py upgrade_tests.py

Posted by Philip Martin <ph...@wandisco.com>.
"Bert Huijben" <be...@qqmail.nl> writes:

>> From: Philip Martin [mailto:philip.martin@wandisco.com]
>> Sent: dinsdag 24 augustus 2010 16:58
>> To: Bert Huijben
>> Cc: 'Julian Foad'; 'Bert Huijben'; dev@subversion.apache.org
>> Subject: Re: svn commit: r988074 - in
>> /subversion/trunk/subversion/tests/cmdline: svntest/wc.py
>> upgrade_tests.py
>> 
>> "Bert Huijben" <be...@qqmail.nl> writes:
>> 
>> >> -----Original Message-----
>> >> From: Julian Foad [mailto:julian.foad@wandisco.com]
>> >>
>> >> If, instead, we construct each the PRISTINE table entry at the point
>> >> where we're converting an entry from the entries file, then we can
>> >> calculate both checksums on the fly, and we can store both of them in
>> >> the new DB row(s).  That's true even for those few pristines that don't
>> >> have any checksum in the 'entries' file.
>> 
>> Or we could modify the current code that constructs the pristine table
>> to update the base/working nodes as well.
>> 
>> > 1.0.0 working copies have no checksums at all if I remembered
>> > correctly and we certainly have to upgrade those WCs. Same recipe
>> > for all files with a revert base.
>> 
>> Well, upgrade_tests 7 (basic_upgrade_1_0) passes in single-db and it
>> verifies the pristine text post-upgrade using MD5.  When I look in the
>> tarball I see checksums in the entries file.
>> 
>> Revert bases won't have a checksum, but until we have NODE_DATA there
>> is nowhere to put a revert base checksum.
>
> Sure there is. 
> If you have a normal replaced file you can have one checksum in BASE_NODE
> (for the original) and one in WORKING_NODE (for the copy that replaced the
> original). Only when you delete that working node, you have no place to
> store it if that copy happens to be the child of the real copy.

Ah yes, of course.  That's one of the things we could fix by updating
base/working while construction pristine.  We need lots more upgrade
regression tests.

-- 
Philip

RE: svn commit: r988074 - in /subversion/trunk/subversion/tests/cmdline: svntest/wc.py upgrade_tests.py

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Philip Martin [mailto:philip.martin@wandisco.com]
> Sent: dinsdag 24 augustus 2010 16:58
> To: Bert Huijben
> Cc: 'Julian Foad'; 'Bert Huijben'; dev@subversion.apache.org
> Subject: Re: svn commit: r988074 - in
> /subversion/trunk/subversion/tests/cmdline: svntest/wc.py
> upgrade_tests.py
> 
> "Bert Huijben" <be...@qqmail.nl> writes:
> 
> >> -----Original Message-----
> >> From: Julian Foad [mailto:julian.foad@wandisco.com]
> >>
> >> If, instead, we construct each the PRISTINE table entry at the point
> >> where we're converting an entry from the entries file, then we can
> >> calculate both checksums on the fly, and we can store both of them in
> >> the new DB row(s).  That's true even for those few pristines that don't
> >> have any checksum in the 'entries' file.
> 
> Or we could modify the current code that constructs the pristine table
> to update the base/working nodes as well.
> 
> > 1.0.0 working copies have no checksums at all if I remembered
> > correctly and we certainly have to upgrade those WCs. Same recipe
> > for all files with a revert base.
> 
> Well, upgrade_tests 7 (basic_upgrade_1_0) passes in single-db and it
> verifies the pristine text post-upgrade using MD5.  When I look in the
> tarball I see checksums in the entries file.
> 
> Revert bases won't have a checksum, but until we have NODE_DATA there
> is nowhere to put a revert base checksum.

Sure there is. 
If you have a normal replaced file you can have one checksum in BASE_NODE
(for the original) and one in WORKING_NODE (for the copy that replaced the
original). Only when you delete that working node, you have no place to
store it if that copy happens to be the child of the real copy.

	Bert

Re: svn commit: r988074 - in /subversion/trunk/subversion/tests/cmdline: svntest/wc.py upgrade_tests.py

Posted by Philip Martin <ph...@wandisco.com>.
"Bert Huijben" <be...@qqmail.nl> writes:

>> -----Original Message-----
>> From: Julian Foad [mailto:julian.foad@wandisco.com]
>> 
>> If, instead, we construct each the PRISTINE table entry at the point
>> where we're converting an entry from the entries file, then we can
>> calculate both checksums on the fly, and we can store both of them in
>> the new DB row(s).  That's true even for those few pristines that don't
>> have any checksum in the 'entries' file.

Or we could modify the current code that constructs the pristine table
to update the base/working nodes as well.

> 1.0.0 working copies have no checksums at all if I remembered
> correctly and we certainly have to upgrade those WCs. Same recipe
> for all files with a revert base.

Well, upgrade_tests 7 (basic_upgrade_1_0) passes in single-db and it
verifies the pristine text post-upgrade using MD5.  When I look in the
tarball I see checksums in the entries file.

Revert bases won't have a checksum, but until we have NODE_DATA there
is nowhere to put a revert base checksum.

-- 
Philip

RE: svn commit: r988074 - in /subversion/trunk/subversion/tests/cmdline: svntest/wc.py upgrade_tests.py

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Julian Foad [mailto:julian.foad@wandisco.com]
> Sent: dinsdag 24 augustus 2010 13:04
> To: Bert Huijben
> Cc: 'Philip Martin'; dev@subversion.apache.org
> Subject: RE: svn commit: r988074 - in
> /subversion/trunk/subversion/tests/cmdline: svntest/wc.py
> upgrade_tests.py
> 
> Bert Huijben wrote:
> > Philip Martin wrote:
> > > "Bert Huijben" <be...@vmoo.com> writes:
> > >
> > > >> * subversion/tests/cmdline/upgrade_tests.py
> > > >>   (text_base_path): Restore MD5 support removed in r960036.
> > > >
> > > > I think the real fix would be to upgrade to SHA1 (and add the
> > > > mapping in the pristines table) in the upgrade step. I expected that
> > > > this was already handled?
> > >
> > > Yes, that needs to happen, and no, it doesn't happen yet.  The new
> > > code stores SHA1 on checkout/update but the upgrade code simply
> copies
> > > the MD5 and doesn't do MD5 to SHA1 conversion.  I discussed this with
> > > Julian on IRC yesterday, the plan is to remove the MD5 support
> > > eventually.
> > >
> > > There are two cases to consider, upgrade from 1.6 to latest and
> > > upgrade from older 1.7 to latest.  For the older 1.7 upgrade we can
> > > simply use the PRISTINE table to replace the MD5 with the
> > > corresponding SHA1 in the bump_to_19 code.
> > >
> > > The 1.6 upgrade is a bit harder.  We can do the text-base to pristine
> > > before doing the entries file, so that the PRISTINE table is
> > > available,
> 
> If, instead, we construct each the PRISTINE table entry at the point
> where we're converting an entry from the entries file, then we can
> calculate both checksums on the fly, and we can store both of them in
> the new DB row(s).  That's true even for those few pristines that don't
> have any checksum in the 'entries' file.

1.0.0 working copies have no checksums at all if I remembered correctly and we certainly have to upgrade those WCs. Same recipe for all files with a revert base.

> Maybe that makes the code flow harder, but it sounds easier than
> maintaining an intermediate store of checksums.
> 
> > but the table is not currently indexed on MD5.  As there is
> > > now only one table per wc it might be too slow if there are lots of
> > > files.  We may need an MD5 index, as part of PRISTINE or separate,
> > > just for the duration of the upgrade.
> 
> *If* we were to use that method (but see below), and *if* it does turn
> out to be too slow, then adding an index would be an easy change.  I
> don't think we need to hesitate from using MD5 look-ups on that account.
> 
> >   The bump_to_19 code can do the
> > > MD5 to SHA1 conversion before switching to single-db, the table is
> > > smaller and may not need an MD5 index (and the bump_to_19 code
> simply
> > > isn't as important as the 1.6 upgrade code).
> >
> > In the old entries format we only kept one checksum, while we can have
> two
> > pristine files, so just keeping it as MD5 can't solve all the issues.
> > But we can't just assume that we never see a collision with MD5 over an
> > entire tree.. or we wouldn't have switched to SHA1 in the first place.
> 
> MD5 collisions during upgrading an existing WC?  A remote possibility of
> course, but yes, let's try to avoid that possibility.  If MD5 look-up
> was the only practical way forward, especially if it were per-directory,
> then I wouldn't be too concerned about handling collisions gracefully
> and think we would only need to detect them and bail out with an
> apologetic message.
> 
> For upgrades from 1.7-dev versions, I think we should be happy to accept
> the possibility of MD5 collisions.

For dev versions no problem, but from upgrades below from format 12 (=last entries files version) or older we should/must do the right thing.
(See the other mail: Just make the intermediate versions use the python script. These users knew that this was an option when they started using trunk)

	Bert

RE: svn commit: r988074 - in /subversion/trunk/subversion/tests/cmdline: svntest/wc.py upgrade_tests.py

Posted by Julian Foad <ju...@wandisco.com>.
Bert Huijben wrote: 
> Philip Martin wrote:
> > "Bert Huijben" <be...@vmoo.com> writes:
> > 
> > >> * subversion/tests/cmdline/upgrade_tests.py
> > >>   (text_base_path): Restore MD5 support removed in r960036.
> > >
> > > I think the real fix would be to upgrade to SHA1 (and add the
> > > mapping in the pristines table) in the upgrade step. I expected that
> > > this was already handled?
> > 
> > Yes, that needs to happen, and no, it doesn't happen yet.  The new
> > code stores SHA1 on checkout/update but the upgrade code simply copies
> > the MD5 and doesn't do MD5 to SHA1 conversion.  I discussed this with
> > Julian on IRC yesterday, the plan is to remove the MD5 support
> > eventually.
> > 
> > There are two cases to consider, upgrade from 1.6 to latest and
> > upgrade from older 1.7 to latest.  For the older 1.7 upgrade we can
> > simply use the PRISTINE table to replace the MD5 with the
> > corresponding SHA1 in the bump_to_19 code.
> > 
> > The 1.6 upgrade is a bit harder.  We can do the text-base to pristine
> > before doing the entries file, so that the PRISTINE table is
> > available,

If, instead, we construct each the PRISTINE table entry at the point
where we're converting an entry from the entries file, then we can
calculate both checksums on the fly, and we can store both of them in
the new DB row(s).  That's true even for those few pristines that don't
have any checksum in the 'entries' file.

Maybe that makes the code flow harder, but it sounds easier than
maintaining an intermediate store of checksums.

> but the table is not currently indexed on MD5.  As there is
> > now only one table per wc it might be too slow if there are lots of
> > files.  We may need an MD5 index, as part of PRISTINE or separate,
> > just for the duration of the upgrade.

*If* we were to use that method (but see below), and *if* it does turn
out to be too slow, then adding an index would be an easy change.  I
don't think we need to hesitate from using MD5 look-ups on that account.

>   The bump_to_19 code can do the
> > MD5 to SHA1 conversion before switching to single-db, the table is
> > smaller and may not need an MD5 index (and the bump_to_19 code simply
> > isn't as important as the 1.6 upgrade code).
> 
> In the old entries format we only kept one checksum, while we can have two
> pristine files, so just keeping it as MD5 can't solve all the issues.
> But we can't just assume that we never see a collision with MD5 over an
> entire tree.. or we wouldn't have switched to SHA1 in the first place.

MD5 collisions during upgrading an existing WC?  A remote possibility of
course, but yes, let's try to avoid that possibility.  If MD5 look-up
was the only practical way forward, especially if it were per-directory,
then I wouldn't be too concerned about handling collisions gracefully
and think we would only need to detect them and bail out with an
apologetic message.

For upgrades from 1.7-dev versions, I think we should be happy to accept
the possibility of MD5 collisions.

> Maybe we should use a somewhat broader structure then just a single (or
> dual) svn_wc_entry_t to keep the state while upgrading. This can then
> contain things like the SHA1 checksums and other values that can't be stored
> in just the entries.

That sounds harder than transferring the pristines in-line, though I'm not sure.

- Julian



RE: svn commit: r988074 - in /subversion/trunk/subversion/tests/cmdline: svntest/wc.py upgrade_tests.py

Posted by Julian Foad <ju...@wandisco.com>.
Bert Huijben wrote: 
> Philip Martin wrote:
> > "Bert Huijben" <be...@vmoo.com> writes:
> > 
> > >> * subversion/tests/cmdline/upgrade_tests.py
> > >>   (text_base_path): Restore MD5 support removed in r960036.
> > >
> > > I think the real fix would be to upgrade to SHA1 (and add the
> > > mapping in the pristines table) in the upgrade step. I expected that
> > > this was already handled?
> > 
> > Yes, that needs to happen, and no, it doesn't happen yet.  The new
> > code stores SHA1 on checkout/update but the upgrade code simply copies
> > the MD5 and doesn't do MD5 to SHA1 conversion.  I discussed this with
> > Julian on IRC yesterday, the plan is to remove the MD5 support
> > eventually.
> > 
> > There are two cases to consider, upgrade from 1.6 to latest and
> > upgrade from older 1.7 to latest.  For the older 1.7 upgrade we can
> > simply use the PRISTINE table to replace the MD5 with the
> > corresponding SHA1 in the bump_to_19 code.
> > 
> > The 1.6 upgrade is a bit harder.  We can do the text-base to pristine
> > before doing the entries file, so that the PRISTINE table is
> > available,

If, instead, we construct each the PRISTINE table entry at the point
where we're converting an entry from the entries file, then we can
calculate both checksums on the fly, and we can store both of them in
the new DB row(s).  That's true even for those few pristines that don't
have any checksum in the 'entries' file.

Maybe that makes the code flow harder, but it sounds easier than
maintaining an intermediate store of checksums.

> but the table is not currently indexed on MD5.  As there is
> > now only one table per wc it might be too slow if there are lots of
> > files.  We may need an MD5 index, as part of PRISTINE or separate,
> > just for the duration of the upgrade.

*If* we were to use that method (but see below), and *if* it does turn
out to be too slow, then adding an index would be an easy change.  I
don't think we need to hesitate from using MD5 look-ups on that account.

>   The bump_to_19 code can do the
> > MD5 to SHA1 conversion before switching to single-db, the table is
> > smaller and may not need an MD5 index (and the bump_to_19 code simply
> > isn't as important as the 1.6 upgrade code).
> 
> In the old entries format we only kept one checksum, while we can have two
> pristine files, so just keeping it as MD5 can't solve all the issues.
> But we can't just assume that we never see a collision with MD5 over an
> entire tree.. or we wouldn't have switched to SHA1 in the first place.

MD5 collisions during upgrading an existing WC?  A remote possibility of
course, but yes, let's try to avoid that possibility.  If MD5 look-up
was the only practical way forward, especially if it were per-directory,
then I wouldn't be too concerned about handling collisions gracefully
and think we would only need to detect them and bail out with an
apologetic message.

For upgrades from 1.7-dev versions, I think we should be happy to accept
the possibility of MD5 collisions.

> Maybe we should use a somewhat broader structure then just a single (or
> dual) svn_wc_entry_t to keep the state while upgrading. This can then
> contain things like the SHA1 checksums and other values that can't be stored
> in just the entries.

That sounds harder than transferring the pristines in-line, though I'm not sure.

- Julian


Re: svn commit: r988074 - in /subversion/trunk/subversion/tests/cmdline: svntest/wc.py upgrade_tests.py

Posted by Philip Martin <ph...@wandisco.com>.
"Bert Huijben" <be...@vmoo.com> writes:

> In the old entries format we only kept one checksum, while we can have two
> pristine files, so just keeping it as MD5 can't solve all the issues.

That's NODE_DATA, nothing to do with single-db.

> But we can't just assume that we never see a collision with MD5 over an
> entire tree.. or we wouldn't have switched to SHA1 in the first place.
>
> Maybe we should use a somewhat broader structure then just a single (or
> dual) svn_wc_entry_t to keep the state while upgrading. This can then
> contain things like the SHA1 checksums and other values that can't be stored
> in just the entries.
>
> Just looking up a node by its MD5 in the pristines table will not resolve
> the collision problems :(

Single-db does make MD5 collisions more likely, but is it a real
problem? I know it's possible to create MD5 collisions in certain
circumstances, but are there any reports of accidental collisions
rather than deliberately created ones?

If we create the temporary MD5 index during the upgrade then we can
probably spot the collision and abort.  I think that would be good
enough.

-- 
Philip

RE: svn commit: r988074 - in /subversion/trunk/subversion/tests/cmdline: svntest/wc.py upgrade_tests.py

Posted by Bert Huijben <be...@vmoo.com>.

> -----Original Message-----
> From: Philip Martin [mailto:philip.martin@wandisco.com]
> Sent: dinsdag 24 augustus 2010 10:40
> To: Bert Huijben
> Cc: dev@subversion.apache.org
> Subject: Re: svn commit: r988074 - in
> /subversion/trunk/subversion/tests/cmdline: svntest/wc.py
> upgrade_tests.py
> 
> "Bert Huijben" <be...@vmoo.com> writes:
> 
> >> * subversion/tests/cmdline/upgrade_tests.py
> >>   (text_base_path): Restore MD5 support removed in r960036.
> >
> > I think the real fix would be to upgrade to SHA1 (and add the
> > mapping in the pristines table) in the upgrade step. I expected that
> > this was already handled?
> 
> Yes, that needs to happen, and no, it doesn't happen yet.  The new
> code stores SHA1 on checkout/update but the upgrade code simply copies
> the MD5 and doesn't do MD5 to SHA1 conversion.  I discussed this with
> Julian on IRC yesterday, the plan is to remove the MD5 support
> eventually.
> 
> There are two cases to consider, upgrade from 1.6 to latest and
> upgrade from older 1.7 to latest.  For the older 1.7 upgrade we can
> simply use the PRISTINE table to replace the MD5 with the
> corresponding SHA1 in the bump_to_19 code.
> 
> The 1.6 upgrade is a bit harder.  We can do the text-base to pristine
> before doing the entries file, so that the PRISTINE table is
> available, but the table is not currently indexed on MD5.  As there is
> now only one table per wc it might be too slow if there are lots of
> files.  We may need an MD5 index, as part of PRISTINE or separate,
> just for the duration of the upgrade.  The bump_to_19 code can do the
> MD5 to SHA1 conversion before switching to single-db, the table is
> smaller and may not need an MD5 index (and the bump_to_19 code simply
> isn't as important as the 1.6 upgrade code).

In the old entries format we only kept one checksum, while we can have two
pristine files, so just keeping it as MD5 can't solve all the issues.
But we can't just assume that we never see a collision with MD5 over an
entire tree.. or we wouldn't have switched to SHA1 in the first place.

Maybe we should use a somewhat broader structure then just a single (or
dual) svn_wc_entry_t to keep the state while upgrading. This can then
contain things like the SHA1 checksums and other values that can't be stored
in just the entries.

Just looking up a node by its MD5 in the pristines table will not resolve
the collision problems :(

	Bert

Re: svn commit: r988074 - in /subversion/trunk/subversion/tests/cmdline: svntest/wc.py upgrade_tests.py

Posted by Philip Martin <ph...@wandisco.com>.
"Bert Huijben" <be...@vmoo.com> writes:

>> * subversion/tests/cmdline/upgrade_tests.py
>>   (text_base_path): Restore MD5 support removed in r960036.
>
> I think the real fix would be to upgrade to SHA1 (and add the
> mapping in the pristines table) in the upgrade step. I expected that
> this was already handled?

Yes, that needs to happen, and no, it doesn't happen yet.  The new
code stores SHA1 on checkout/update but the upgrade code simply copies
the MD5 and doesn't do MD5 to SHA1 conversion.  I discussed this with
Julian on IRC yesterday, the plan is to remove the MD5 support
eventually.

There are two cases to consider, upgrade from 1.6 to latest and
upgrade from older 1.7 to latest.  For the older 1.7 upgrade we can
simply use the PRISTINE table to replace the MD5 with the
corresponding SHA1 in the bump_to_19 code.

The 1.6 upgrade is a bit harder.  We can do the text-base to pristine
before doing the entries file, so that the PRISTINE table is
available, but the table is not currently indexed on MD5.  As there is
now only one table per wc it might be too slow if there are lots of
files.  We may need an MD5 index, as part of PRISTINE or separate,
just for the duration of the upgrade.  The bump_to_19 code can do the
MD5 to SHA1 conversion before switching to single-db, the table is
smaller and may not need an MD5 index (and the bump_to_19 code simply
isn't as important as the 1.6 upgrade code).

-- 
Philip