You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Evgeny Kotkov via dev <de...@subversion.apache.org> on 2023/03/22 15:23:12 UTC

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

Evgeny Kotkov <ev...@visualsvn.com> writes:

> > Now, how hard would this be to actually implement?
>
> To have a more or less accurate estimate, I went ahead and prepared the
> first-cut implementation of an approach that makes the pristine checksum
> kind configurable in a working copy.
>
> The current implementation passes all tests in my environment and seems to
> work in practice.  It is available on the branch:
>
>   https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind
>
> The implementation on the branch allows creating working copies that use a
> checksum kind other than SHA-1.

I extended the current implementation to use a dynamically salted SHA-1
checksum, rather than a SHA-1 with a statically hardcoded salt.
The dynamic salt is generated during the creation of a wc.db.

The implementation is available on a separate branch:

  https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt

The change is a bit massive, but in the meantime I think that it should solve
the potential problem without any practical drawbacks, except for the lack
of the mentioned ra_serf fetch optimization.

So overall I'd propose to bring this change to trunk, to improve the current
state around checksum collisions in the working copy, and to also have the
infrastructure for supporting different checksum kinds in place, in case
we need it in the future.

This change is still being blocked by a veto, but if danielsh changes his
mind and if there won't be other objections, I'm ready to complete the few
remaining bits and merge it to trunk.


Thanks,
Evgeny Kotkov

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

Posted by Branko Čibej <br...@apache.org>.
On 18.01.2024 08:43, Daniel Sahlberg wrote:
> As far as I understand, the point of multi-hash is to keep the WC 
> format between versions (so older clients can continue to use the WC). 
> I need some help to understand how that would work in practice. Let's 
> say that 1.15 adds SHAABC, 1.16 adds SHAXYZ. Then 1.17 drops SHA1. But...
> - A 1.17 client will only use SHAABC or SHAXYZ hashes.
> - A 1.16 client can use SHA1, SHAABC and SHAXYZ hashes.
> - A 1.15 client can only use SHA1 and SHAABC hashes.
>
> How can these work together? A WC created in 1.17 can't be used by a 
> 1.15 client and a WC created in 1.15 (with SHA1) can't be used by a 
> 1.17 client. How is this different from bumping the format? How do we 
> detect this?

It's just another dimension of changing the format. When you introduce 
multihash, you have to bump the format number so that clients that don't 
know about it won't try to use the WC. Clients that _do_ know about it 
will have to check which hash algorithm(s) are used in any case.


> At least, we'd need some method of updating the hashes in the 
> database, akin the WC format upgrades in some versions (was it 1.8?).

"svn upgrade" is where this would happen. On the multi-wc-format branch 
(if memory serves), it accepts a target WC version -- which is 
equivalent to the feature set supported by the WC. There's no reason why 
it couldn't also grow a "--force-hash=quantum-entangled" option.

-- Brane

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

Posted by Nathan Hartman <ha...@gmail.com>.
On Sat, Jan 13, 2024 at 3:56 PM Nathan Hartman <ha...@gmail.com>
wrote:

> Pros: Future-proofing against the real and perceived brokenness of any
> hash types.
>

I meant to write:

Pros: Future-proofing against the real and perceived brokenness of any hash
types, or the deprecation and later removal of their implementations from
our deps.

Cheers,
Nathan

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

Posted by Evgeny Kotkov via dev <de...@subversion.apache.org>.
Daniel Sahlberg <da...@gmail.com> writes:

> As far as I understand, the point of multi-hash is to keep the WC format
> between versions (so older clients can continue to use the WC).

Just as a minor note, the working copies created using the implementation
on the `pristine-checksum-salt` branch don't multi-hash the contents, but
rather make the [single] used checksum kind configurable and persist it at
the moment when a working copy is created or upgraded.

> I need some help to understand how that would work in practice. Let's say
> that 1.15 adds SHAABC, 1.16 adds SHAXYZ. Then 1.17 drops SHA1. But...
> - A 1.17 client will only use SHAABC or SHAXYZ hashes.
> - A 1.16 client can use SHA1, SHAABC and SHAXYZ hashes.
> - A 1.15 client can only use SHA1 and SHAABC hashes.
>
> How can these work together? A WC created in 1.17 can't be used by a 1.15
> client and a WC created in 1.15 (with SHA1) can't be used by a 1.17 client.
> How is this different from bumping the format? How do we detect this?

In the current design available on the `pristine-checksum-salt` branch, the
supported checksum kinds are tied to a working copy format, and any supported
checksum kind may additionally use a dynamic salt.  For example, format 33
supports only SHA-1 (regular or dynamically salted), but a newer format 34
can add support for another checksum kind such as SHA-2 if necessary.

When an existing working copy is upgraded to a newer format, its current
checksum kind is retained as is (we can't rehash the content in a
`--store-pristine=no` case because the pristines are not available).

I don't know if we'll find ourselves having to forcefully phase out SHA-1
*even* for such working copies that retain an older checksum kind, i.e.,
it might be enough to use the new checksum kind only for freshly created
working copies.  However, there would be a few options to consider:

I think that milder options could include warning the user to check out a
new working copy (that would use a different checksum kind), and a harsher
option could mean adding a new format that doesn't support SHA-1 under
any circumstances, and declaring all previously available working copy
formats unsupported.


Regards,
Evgeny Kotkov

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

Posted by Daniel Sahlberg <da...@gmail.com>.
@Karl Fogel <kf...@red-bean.com>,  @Evgeny Kotkov
<ev...@visualsvn.com>

Any chance for a comment on the questions in this thread?

I've also added my own comment below.

Kind regards,
Daniel



Den sön 14 jan. 2024 kl 00:56 skrev Nathan Hartman <hartman.nathan@gmail.com
>:

> On Fri, Jan 12, 2024 at 3:51 PM Johan Corveleyn <jc...@gmail.com> wrote:
>
>> On Fri, Jan 12, 2024 at 12:37 PM Daniel Shahaf <d....@daniel.shahaf.name>
>> wrote:
>> ...
>> > Procedurally, the long hiatus is counterproductive.  Neither kfogel nor
>> > I had the context in our heads, and the cache misses took their toll in
>> > tuits and in wallclock time.  Furthermore, I have less spare time for
>> > dev@ discussions than I did when I cast the veto (= a year ago next
>> > Saturday).  Going forward it might be preferable for threads not to
>> > hibernate.
>>
>> I agree, but obviously the hibernation is not some deliberate action
>> by anyone. It's just that most of us here have less spare time for
>> dev@ discussions (and for SVN development) than before. Especially for
>> such complex matters, and especially when people feel there are
>> walking into a minefield. There are only a few active devs left, and
>> tuits are running low ...
>>
>> ...
>> > That being the case, I have considered whether merging the feature
>> > branch outweighs letting dev@ take a not-only-/pro forma/ role in
>> > design discussions.  I am of the opinion that it does not, and
>> > therefore I reäfirrm the veto.
>>
>> It has become more clear to me (I was only following tangentially)
>> that your veto is focused on the development methodology and the lack
>> of design discussion. Is that a valid reason for a veto? We are low on
>> resources, someone still finds time to make some progress, no one
>> blocks it on technical grounds, and then someone vetoes it because we
>> don't have enough resources?
>>
>> That puts us pretty much in deadlock, because we are too low on
>> resources. Or maybe I misunderstand?
>>
>> To be clear: I appreciate your input, Daniel, and your insistence on a
>> more thorough design discussion. I assume it's coming from a genuine
>> concern that we formulate problems well, and think hard about possible
>> solutions (focusing on the precise problem we are trying to solve).
>> But at the end of the day, if that design discussion doesn't happen
>> (or not enough to your satisfaction anyway), is that grounds for a
>> veto? For me it's a tough call, because on the one hand you have a
>> point, but on the other hand ... you're blocking _some_ progress
>> because the process behind it is not perfect (which is hard to do with
>> the 3.25 tuits we have left).
>>
>> > P.S.  Could that BRANCH-README please state what's the problem the
>> branch
>> > means to solve, i.e., the goal / acceptance test?  "Make it possible to
>> > «svn add» SHA-1 collisions"?
>>
>> I agree that would be a good step.
>>
>> I too find it a bit unclear what problem we're actually trying to
>> solve, apart from a vague feeling that SHA-1 will become more and more
>> broken over time, and that this will cause fatal injury to SVN (in its
>> WC, protocol, dump format, or repository). And perhaps the fact that
>> security auditors are becoming more and more triggered by seeing SHA-1
>> (even if they don't understand the way it is used and its
>> ramifications). Making it possible to 'svn add' SHA-1 collisions is
>> not it, I think.
>>
>> --
>> Johan
>>
>
>
> Johan's reply sums up my thoughts pretty closely.
>
> I would very much like to *avoid* all of the following: deadlock, bad
> feelings, and members of this small community leaving because of deadlocks
> or bad feelings.
>
> I agree that (at the very least), BRANCH-README should define what problem
> the branch aims to solve, and perhaps that's really the main thing we need
> to discuss and resolve.
>
> Johan touched on one issue with SHA1: regardless how it is actually used
> in SVN and whether it is adequate for those purposes, there is customer
> perception. I can imagine, for example, the IT dept of some big
> $corporation could blacklist SHA1 because it is considered broken for
> cryptographic purposes. But they could blacklist it for everything. Even
> though it is safe and effective for our use cases, try explaining that to
> an admin who is struggling to meet such a blanket policy.
>
> I would like to add another reason to think about a post-SHA1 future: I'm
> writing on mobile so I can't easily grep for things now, but could our
> dependencies eventually remove the SHA1 implementation? (I just saw
> something about removal of DSA from some famous lib not too long ago. SHA1
> could be next?)
>
> When would SHA1 disappear? I don't know, but I consider it plausible to
> happen in about 5 years.
>
> If SHA1 is removed in the future, there will need to be a mad dash to
> replace it. Or we'll have to add a new dependency to use an alternate
> implementation. Or we'll have to implement our own SHA1 or copy some code
> into SVN. All of these seem bad to me.
>
> Switching to a different hash is also a bad idea, I think, because it is
> likely to suffer the same problems as SHA1 later on, as cryptography
> research proceeds and newer hashes become declared broken.
>
> I'll try to describe what I think is a best case scenario: Support
> multi-hash in 1.15 in format 32 WCs. SHA1 can continue to be the default
> but we should be careful not to require a SHA1 implementation to exist.
> Furthermore, by default "svn checkout" continues to create format 31 WCs
> (this is implemented currently). When new (1.15 and up) servers talk to new
> clients, they'll have to negotiate the "best" common hash for the protocol.
> Over time, we can add other hashes. Over time, distros and package managers
> pick up 1.15. Someday down the line (5 years?), if SHA1 goes away, or an IT
> dept wants to avoid SHA1 for whatever reasons, most of the hard work of
> changing hashes will have been done already and most people will have the
> newer software on their system already. Changing hashes then becomes a
> trivial matter. The same will be true of any future hashes that become
> declared broken, requiring almost no additional work on our part. Notably,
> it will not be necessary to bump the WC or protocol formats because of
> hashes.
>

> Pros: Future-proofing against the real and perceived brokenness of any
> hash types.
>
> Cons: Requires a lot of work up front, which no one might volunteer to do.
>
> We should continue hashing out (pun intended) how to address the different
> concerns raised.
>
> Are there any technical reasons *not* to support other hashes going
> forward?
>
> Are there other pros or cons to supporting a scenario like I described?
>

As far as I understand, the point of multi-hash is to keep the WC format
between versions (so older clients can continue to use the WC). I need some
help to understand how that would work in practice. Let's say that 1.15
adds SHAABC, 1.16 adds SHAXYZ. Then 1.17 drops SHA1. But...
- A 1.17 client will only use SHAABC or SHAXYZ hashes.
- A 1.16 client can use SHA1, SHAABC and SHAXYZ hashes.
- A 1.15 client can only use SHA1 and SHAABC hashes.

How can these work together? A WC created in 1.17 can't be used by a 1.15
client and a WC created in 1.15 (with SHA1) can't be used by a 1.17 client.
How is this different from bumping the format? How do we detect this?

At least, we'd need some method of updating the hashes in the database,
akin the WC format upgrades in some versions (was it 1.8?).

Kind regards,
Daniel

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

Posted by Nathan Hartman <ha...@gmail.com>.
On Fri, Jan 12, 2024 at 3:51 PM Johan Corveleyn <jc...@gmail.com> wrote:

> On Fri, Jan 12, 2024 at 12:37 PM Daniel Shahaf <d....@daniel.shahaf.name>
> wrote:
> ...
> > Procedurally, the long hiatus is counterproductive.  Neither kfogel nor
> > I had the context in our heads, and the cache misses took their toll in
> > tuits and in wallclock time.  Furthermore, I have less spare time for
> > dev@ discussions than I did when I cast the veto (= a year ago next
> > Saturday).  Going forward it might be preferable for threads not to
> > hibernate.
>
> I agree, but obviously the hibernation is not some deliberate action
> by anyone. It's just that most of us here have less spare time for
> dev@ discussions (and for SVN development) than before. Especially for
> such complex matters, and especially when people feel there are
> walking into a minefield. There are only a few active devs left, and
> tuits are running low ...
>
> ...
> > That being the case, I have considered whether merging the feature
> > branch outweighs letting dev@ take a not-only-/pro forma/ role in
> > design discussions.  I am of the opinion that it does not, and
> > therefore I reäfirrm the veto.
>
> It has become more clear to me (I was only following tangentially)
> that your veto is focused on the development methodology and the lack
> of design discussion. Is that a valid reason for a veto? We are low on
> resources, someone still finds time to make some progress, no one
> blocks it on technical grounds, and then someone vetoes it because we
> don't have enough resources?
>
> That puts us pretty much in deadlock, because we are too low on
> resources. Or maybe I misunderstand?
>
> To be clear: I appreciate your input, Daniel, and your insistence on a
> more thorough design discussion. I assume it's coming from a genuine
> concern that we formulate problems well, and think hard about possible
> solutions (focusing on the precise problem we are trying to solve).
> But at the end of the day, if that design discussion doesn't happen
> (or not enough to your satisfaction anyway), is that grounds for a
> veto? For me it's a tough call, because on the one hand you have a
> point, but on the other hand ... you're blocking _some_ progress
> because the process behind it is not perfect (which is hard to do with
> the 3.25 tuits we have left).
>
> > P.S.  Could that BRANCH-README please state what's the problem the branch
> > means to solve, i.e., the goal / acceptance test?  "Make it possible to
> > «svn add» SHA-1 collisions"?
>
> I agree that would be a good step.
>
> I too find it a bit unclear what problem we're actually trying to
> solve, apart from a vague feeling that SHA-1 will become more and more
> broken over time, and that this will cause fatal injury to SVN (in its
> WC, protocol, dump format, or repository). And perhaps the fact that
> security auditors are becoming more and more triggered by seeing SHA-1
> (even if they don't understand the way it is used and its
> ramifications). Making it possible to 'svn add' SHA-1 collisions is
> not it, I think.
>
> --
> Johan
>


Johan's reply sums up my thoughts pretty closely.

I would very much like to *avoid* all of the following: deadlock, bad
feelings, and members of this small community leaving because of deadlocks
or bad feelings.

I agree that (at the very least), BRANCH-README should define what problem
the branch aims to solve, and perhaps that's really the main thing we need
to discuss and resolve.

Johan touched on one issue with SHA1: regardless how it is actually used in
SVN and whether it is adequate for those purposes, there is customer
perception. I can imagine, for example, the IT dept of some big
$corporation could blacklist SHA1 because it is considered broken for
cryptographic purposes. But they could blacklist it for everything. Even
though it is safe and effective for our use cases, try explaining that to
an admin who is struggling to meet such a blanket policy.

I would like to add another reason to think about a post-SHA1 future: I'm
writing on mobile so I can't easily grep for things now, but could our
dependencies eventually remove the SHA1 implementation? (I just saw
something about removal of DSA from some famous lib not too long ago. SHA1
could be next?)

When would SHA1 disappear? I don't know, but I consider it plausible to
happen in about 5 years.

If SHA1 is removed in the future, there will need to be a mad dash to
replace it. Or we'll have to add a new dependency to use an alternate
implementation. Or we'll have to implement our own SHA1 or copy some code
into SVN. All of these seem bad to me.

Switching to a different hash is also a bad idea, I think, because it is
likely to suffer the same problems as SHA1 later on, as cryptography
research proceeds and newer hashes become declared broken.

I'll try to describe what I think is a best case scenario: Support
multi-hash in 1.15 in format 32 WCs. SHA1 can continue to be the default
but we should be careful not to require a SHA1 implementation to exist.
Furthermore, by default "svn checkout" continues to create format 31 WCs
(this is implemented currently). When new (1.15 and up) servers talk to new
clients, they'll have to negotiate the "best" common hash for the protocol.
Over time, we can add other hashes. Over time, distros and package managers
pick up 1.15. Someday down the line (5 years?), if SHA1 goes away, or an IT
dept wants to avoid SHA1 for whatever reasons, most of the hard work of
changing hashes will have been done already and most people will have the
newer software on their system already. Changing hashes then becomes a
trivial matter. The same will be true of any future hashes that become
declared broken, requiring almost no additional work on our part. Notably,
it will not be necessary to bump the WC or protocol formats because of
hashes.

Pros: Future-proofing against the real and perceived brokenness of any hash
types.

Cons: Requires a lot of work up front, which no one might volunteer to do.

We should continue hashing out (pun intended) how to address the different
concerns raised.

Are there any technical reasons *not* to support other hashes going forward?

Are there other pros or cons to supporting a scenario like I described?

Thanks,
Nathan

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

Posted by Daniel Sahlberg <da...@gmail.com>.
Den lör 13 jan. 2024 kl 00:50 skrev Johan Corveleyn <jc...@gmail.com>:

> On Fri, Jan 12, 2024 at 12:37 PM Daniel Shahaf <d....@daniel.shahaf.name>
> wrote:
> ...
> > Procedurally, the long hiatus is counterproductive.  Neither kfogel nor
> > I had the context in our heads, and the cache misses took their toll in
> > tuits and in wallclock time.  Furthermore, I have less spare time for
> > dev@ discussions than I did when I cast the veto (= a year ago next
> > Saturday).  Going forward it might be preferable for threads not to
> > hibernate.
>
> I agree, but obviously the hibernation is not some deliberate action
> by anyone. It's just that most of us here have less spare time for
> dev@ discussions (and for SVN development) than before. Especially for
> such complex matters, and especially when people feel there are
> walking into a minefield. There are only a few active devs left, and
> tuits are running low ...
>

I agree with Johan on this. The long hiatus is unfortunate. But it won't
help to point fingers at this point.



>
> ...
> > That being the case, I have considered whether merging the feature
> > branch outweighs letting dev@ take a not-only-/pro forma/ role in
> > design discussions.  I am of the opinion that it does not, and
> > therefore I reäfirrm the veto.
>
> It has become more clear to me (I was only following tangentially)
> that your veto is focused on the development methodology and the lack
> of design discussion. Is that a valid reason for a veto? We are low on
> resources, someone still finds time to make some progress, no one
> blocks it on technical grounds, and then someone vetoes it because we
> don't have enough resources?
>
> That puts us pretty much in deadlock, because we are too low on
> resources. Or maybe I misunderstand?
>
> To be clear: I appreciate your input, Daniel, and your insistence on a
> more thorough design discussion. I assume it's coming from a genuine
> concern that we formulate problems well, and think hard about possible
> solutions (focusing on the precise problem we are trying to solve).
> But at the end of the day, if that design discussion doesn't happen
> (or not enough to your satisfaction anyway), is that grounds for a
> veto? For me it's a tough call, because on the one hand you have a
> point, but on the other hand ... you're blocking _some_ progress
> because the process behind it is not perfect (which is hard to do with
> the 3.25 tuits we have left).
>
> > P.S.  Could that BRANCH-README please state what's the problem the branch
> > means to solve, i.e., the goal / acceptance test?  "Make it possible to
> > «svn add» SHA-1 collisions"?
>
> I agree that would be a good step.
>
> I too find it a bit unclear what problem we're actually trying to
> solve, apart from a vague feeling that SHA-1 will become more and more
> broken over time, and that this will cause fatal injury to SVN (in its
> WC, protocol, dump format, or repository). And perhaps the fact that
> security auditors are becoming more and more triggered by seeing SHA-1
> (even if they don't understand the way it is used and its
> ramifications). Making it possible to 'svn add' SHA-1 collisions is
> not it, I think.
>

I also agree with this.

From what I remember of the dicsussions earlier there were concerns that a
changed file might go undetected if someone change it to another file with
a collision with the original file. I think that might be a vaild point,
especially if we don't have the pristine files anymore.

I'd also like to understand why we need the multi-checksum format instead
of just plainly switching to XXX (insert favourite checksuming algorithm
here). Does it help us to have multiple types of checksums available? Would
we use BOTH as a resort (likelyhood of collision in SHA1 and in XXX at the
same time approaching zero)? Does it help backwards/forwards compatibility?

Kind regards,
Daniel Sahlberg

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

Posted by Johan Corveleyn <jc...@gmail.com>.
On Fri, Jan 12, 2024 at 12:37 PM Daniel Shahaf <d....@daniel.shahaf.name> wrote:
...
> Procedurally, the long hiatus is counterproductive.  Neither kfogel nor
> I had the context in our heads, and the cache misses took their toll in
> tuits and in wallclock time.  Furthermore, I have less spare time for
> dev@ discussions than I did when I cast the veto (= a year ago next
> Saturday).  Going forward it might be preferable for threads not to
> hibernate.

I agree, but obviously the hibernation is not some deliberate action
by anyone. It's just that most of us here have less spare time for
dev@ discussions (and for SVN development) than before. Especially for
such complex matters, and especially when people feel there are
walking into a minefield. There are only a few active devs left, and
tuits are running low ...

...
> That being the case, I have considered whether merging the feature
> branch outweighs letting dev@ take a not-only-/pro forma/ role in
> design discussions.  I am of the opinion that it does not, and
> therefore I reäfirrm the veto.

It has become more clear to me (I was only following tangentially)
that your veto is focused on the development methodology and the lack
of design discussion. Is that a valid reason for a veto? We are low on
resources, someone still finds time to make some progress, no one
blocks it on technical grounds, and then someone vetoes it because we
don't have enough resources?

That puts us pretty much in deadlock, because we are too low on
resources. Or maybe I misunderstand?

To be clear: I appreciate your input, Daniel, and your insistence on a
more thorough design discussion. I assume it's coming from a genuine
concern that we formulate problems well, and think hard about possible
solutions (focusing on the precise problem we are trying to solve).
But at the end of the day, if that design discussion doesn't happen
(or not enough to your satisfaction anyway), is that grounds for a
veto? For me it's a tough call, because on the one hand you have a
point, but on the other hand ... you're blocking _some_ progress
because the process behind it is not perfect (which is hard to do with
the 3.25 tuits we have left).

> P.S.  Could that BRANCH-README please state what's the problem the branch
> means to solve, i.e., the goal / acceptance test?  "Make it possible to
> «svn add» SHA-1 collisions"?

I agree that would be a good step.

I too find it a bit unclear what problem we're actually trying to
solve, apart from a vague feeling that SHA-1 will become more and more
broken over time, and that this will cause fatal injury to SVN (in its
WC, protocol, dump format, or repository). And perhaps the fact that
security auditors are becoming more and more triggered by seeing SHA-1
(even if they don't understand the way it is used and its
ramifications). Making it possible to 'svn add' SHA-1 collisions is
not it, I think.

-- 
Johan

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

Posted by Nathan Hartman <ha...@gmail.com>.
On Thu, Feb 1, 2024 at 5:26 PM Daniel Sahlberg
<da...@gmail.com> wrote:
>
> Gentlemen,
>
> It seems you have both had your say in what flaws there has been in the process. Can we please leave this part of the discussion and continue on the technical issues? I'd hate for this discussion to turn to pie-throwing where someone in the end feel offended and leave the community. We are such a small community and we can't afford to lose someone just because an argument turns toxic (it has happened before so let's make sure it doesn't happen again, please).

I completely agree. Yes, there has been disagreement about process,
but it is counterproductive to debate that anymore. Let's focus on the
technical question and try to reach some consensus on what (if
anything) to do.

> As for the technical side, can we break down the current status and the desired future status to some points and then look at what options we have for solutions?
>
> Currently we use SHA1, which have known attacks. What are the risks?
> - It has been argued that `svn st` will, especially with no-pristines, be extra vulnerable to not detecting a modified file if someone can create a collision with the checksum of the original file
> - Someone also argued that a software could potentially be banned just because it uses a checksum with a known attack, even if the checksum isn't used in a security critical way.

I was the one who spoke about that possibility.

Just one example: NIST has already recommended federal agencies to
stop using SHA-1 for "signatures and other operations threatened by
collision attacks" and by 31 Dec 2030 NIST will publish "a revision of
FIPS 180 that removes the SHA-1 specification" and "Modules that still
use SHA-1 after 2030 will not be permitted for purchase by the federal
government." All those quotes are taken from [1], which was one of the
top hits in a recent DuckDuckGo search. (I don't remember the exact
search.)

Now, even if SVN's use cases of SHA1 are agreed by the developers to
be completely safe, I think it is a real possibility that some sites
could ban SVN because they consider SHA1 a banned algorithm, and even
if we explain that SVN's use of SHA1 is completely safe, those
explanations might not be acceptable in those settings, even if we are
right.

Given the way technology is used, understood, and sometimes (often?)
misunderstood, I can imagine a ridiculous scenario in which Subversion
could use 8-bit CRC, but not SHA1, even though SHA1 is much stronger
than 8-bit CRC, just because SHA1 is "banned" and 8-bit CRC is not.

> What options do we have and how do they mitigate the above risks?> - Evgeny has already shown a possible solution with a salted hash (keeping SHA-1).
> - Can we switch to another hash function completely and does it offer any benefits compared to the salted SHA-1?
> - Should we even do both?
>
> Any other points?
>
> Any thoughts?
>
> I would like to see this thread progress and I hope we can find consensus on a way forward.
>
> Kind regards,
> Daniel Sahlberg

I, too, hope the community can come together and reach a consensus,
whatever that ends up being.

[1] https://www.securityweek.com/nist-retire-27-year-old-sha-1-cryptographic-algorithm/

Cheers,
Nathan

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

Posted by Daniel Sahlberg <da...@gmail.com>.
Gentlemen,

It seems you have both had your say in what flaws there has been in the
process. Can we please leave this part of the discussion and continue on
the technical issues? I'd hate for this discussion to turn to pie-throwing
where someone in the end feel offended and leave the community. We are such
a small community and we can't afford to lose someone just because an
argument turns toxic (it has happened before so let's make sure it doesn't
happen again, please).

As for the technical side, can we break down the current status and the
desired future status to some points and then look at what options we have
for solutions?

Currently we use SHA1, which have known attacks. What are the risks?
- It has been argued that `svn st` will, especially with no-pristines, be
extra vulnerable to not detecting a modified file if someone can create a
collision with the checksum of the original file
- Someone also argued that a software could potentially be banned just
because it uses a checksum with a known attack, even if the checksum isn't
used in a security critical way.

What options do we have and how do they mitigate the above risks?
- Evgeny has already shown a possible solution with a salted hash (keeping
SHA-1).
- Can we switch to another hash function completely and does it offer any
benefits compared to the salted SHA-1?
- Should we even do both?

Any other points?

Any thoughts?

I would like to see this thread progress and I hope we can find consensus
on a way forward.

Kind regards,
Daniel Sahlberg


Den tors 18 jan. 2024 kl 14:36 skrev Evgeny Kotkov via dev <
dev@subversion.apache.org>:

> Daniel Shahaf <d....@daniel.shahaf.name> writes:
>
> > Procedurally, the long hiatus is counterproductive.
>
> This reminds me that the substantive discussion of your veto ended with my
> email from 8 Feb 2023 that had four direct questions to you and was left
> without an answer:
>
> ``````
>   > That's not how design discussions work.  A design discussion doesn't go
>   > "state decision; state pros; implement"; it goes "state problem;
> discuss
>   > potential solutions, pros, cons; decide; implement" (cf. [4, 5, 6]).
>
>   Well, I think it may not be as simple as it seems to you.  Who decided
> that
>   we should follow the process you're describing?  Is there a thread with a
>   consensus on this topic?  Or do you insist on using this specific process
>   because it's the only process that seems obvious to you?  What
> alternatives
>   to it have been considered?
>
>   As far as I can tell, the process you're suggesting is effectively a
>   waterfall-like process, and there are quite a lot of concerns about its
>   effectiveness, because the decisions have to be made in the conditions of
>   a lack of information.
> ``````
>
> It's been more than 11 months since that email, and those questions still
> don't have an answer.  So if we are to resume this discussion, let's do it
> from the proper point.
>
> > You guys are welcome to try to /convince/ me to change my opinion, or to
> > have the veto invalidated.  In either case, you will be more likely to
> > succeed should your arguments relate not only to the veto's implications
> > but also to its /sine qua non/ component: its rationale.
>
> Just in case, my personal opinion here is that the veto is invalid.
>
> Firstly, based on my understanding, the ASF rules prohibit casting a veto
> without an appropriate technical justification (see [1], which I personally
> agree with).  Secondly, it seems that the process you are imposing hasn't
> been
> accepted in this community.  As far as I know, this topic was tangentially
> discussed before (see [2], for example), and it looks like there hasn't
> been
> a consensus to change our current Commit-Then-Review process into some
> sort of Review-Then-Commit.
>
> (At the same time I won't even try to /convince/ you, sorry.)
>
> [1] https://www.apache.org/foundation/voting.html
> [2] https://lists.apache.org/thread/ow2x68g2k4lv2ycr81d14p8r8w2jj1xl
>
>
> Regards,
> Evgeny Kotkov
>

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

Posted by Evgeny Kotkov via dev <de...@subversion.apache.org>.
Daniel Shahaf <d....@daniel.shahaf.name> writes:

> Procedurally, the long hiatus is counterproductive.

This reminds me that the substantive discussion of your veto ended with my
email from 8 Feb 2023 that had four direct questions to you and was left
without an answer:

``````
  > That's not how design discussions work.  A design discussion doesn't go
  > "state decision; state pros; implement"; it goes "state problem; discuss
  > potential solutions, pros, cons; decide; implement" (cf. [4, 5, 6]).

  Well, I think it may not be as simple as it seems to you.  Who decided that
  we should follow the process you're describing?  Is there a thread with a
  consensus on this topic?  Or do you insist on using this specific process
  because it's the only process that seems obvious to you?  What alternatives
  to it have been considered?

  As far as I can tell, the process you're suggesting is effectively a
  waterfall-like process, and there are quite a lot of concerns about its
  effectiveness, because the decisions have to be made in the conditions of
  a lack of information.
``````

It's been more than 11 months since that email, and those questions still
don't have an answer.  So if we are to resume this discussion, let's do it
from the proper point.

> You guys are welcome to try to /convince/ me to change my opinion, or to
> have the veto invalidated.  In either case, you will be more likely to
> succeed should your arguments relate not only to the veto's implications
> but also to its /sine qua non/ component: its rationale.

Just in case, my personal opinion here is that the veto is invalid.

Firstly, based on my understanding, the ASF rules prohibit casting a veto
without an appropriate technical justification (see [1], which I personally
agree with).  Secondly, it seems that the process you are imposing hasn't been
accepted in this community.  As far as I know, this topic was tangentially
discussed before (see [2], for example), and it looks like there hasn't been
a consensus to change our current Commit-Then-Review process into some
sort of Review-Then-Commit.

(At the same time I won't even try to /convince/ you, sorry.)

[1] https://www.apache.org/foundation/voting.html
[2] https://lists.apache.org/thread/ow2x68g2k4lv2ycr81d14p8r8w2jj1xl


Regards,
Evgeny Kotkov

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Karl Fogel wrote on Wed, 03 Jan 2024 22:13 +00:00:
> On 01 Apr 2023, Evgeny Kotkov via dev wrote:
> > Daniel Shahaf <d....@daniel.shahaf.name> writes:
> > 
> > > What's the question or action item to/for me?  Thanks.
> > 
> > I'm afraid I don't fully understand your question.  As you
> > probably remember, the change is blocked by your veto.  To my
> > knowledge, this veto hasn't been revoked as of now, and I simply
> > mentioned that in my email.  It is entirely your decision
> > whether or not to take any action regarding this matter.
> 
> So AIUI, Evgeny is asking you to withdraw your veto, Daniel. Evgeny would
> like to merge this into trunk -- on the grounds, I believe, that it is
> strictly an improvement over what we have now, and it opens the door to
> further future improvements (each of which would go through the usual
> discussion & consensus process, of course).

So, I looked.

This thread comprises 237 posts spanning 30 months (July 2021 through
today).  On 2023-01-20 I cast a veto.  There was some activity
afterwards, but until the parent post of this one, the thread has been
silent for the better part of a year; and now I'm being asked to
withdraw my veto.

Procedurally, the long hiatus is counterproductive.  Neither kfogel nor
I had the context in our heads, and the cache misses took their toll in
tuits and in wallclock time.  Furthermore, I have less spare time for
dev@ discussions than I did when I cast the veto (= a year ago next
Saturday).  Going forward it might be preferable for threads not to
hibernate.

You didn't link the veto, so I had to go grep for it.  It is,
presumably, this one:

>>>> # Archived-At: https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3C904aded6-5ef0-4123-ade0-e23a3bb56726%40app.fastmail.com%3E
>>>> Date: Fri, 20 Jan 2023 12:15:24 +0000
>>>> From: Daniel Shahaf
>>>> To: dev@subversion.apache.org
>>>> Subject: Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format
>>>> Message-Id: <90...@app.fastmail.com>
>>>> 
>>>> Evgeny Kotkov via dev wrote on Thu, 19 Jan 2023 18:52 +00:00:
>>>> > I can complete the work on this branch and bring it to a production-ready
>>>> > state, assuming there are no objections.
>>>> 
>>>> Your assumption is counterfactual:
>>>> 
>>>> https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E
>>>> 
>>>> https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E
>>>> 
>>>> Objections have been raised, been left unanswered, and now
>>>> implementation work has commenced following the original design.  That's
>>>> not acceptable.  I'm vetoing the change until a non-rubber-stamp design
>>>> discussion has been completed on the public dev@ list.

So, this veto being in front of me, let me reply to the request that
I withdraw it:

> So AIUI, Evgeny is asking you to withdraw your veto, Daniel. Evgeny would
> like to merge this into trunk -- on the grounds, I believe, that it is
> strictly an improvement over what we have now, and it opens the door to
> further future improvements (each of which would go through the usual
> discussion & consensus process, of course).
> 
> Evgeny's work is on this branch...
> 
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt
> 
> ...which in turn branched from
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind.
> 
> I used this command to get an overview of the work:
> 
> $ svn cat https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt/BRANCH-README

As far as I can tell, the request for veto withdrawal is grounded only
in the fact that the veto, whilst in force, prevents the feature branch
from being merged/released.  The request does not allege the veto was
invalid or unfounded in the first place; nor that the veto has /become/
invalid or unfounded due to time having passed; nor that modifications
or alterations to the code [or, in this case, to the decision-making
process] have been made and are believed to have addressed the veto's
grounds.

In summary, the request only deals with the fact of a veto and its
formal/procedural implications, but does not deal with the substantive
justification for the veto at all.

That being the case, I have no reason to believe the original grounds of
the veto have been addressed.

That being the case, I have considered whether merging the feature
branch outweighs letting dev@ take a not-only-/pro forma/ role in
design discussions.  I am of the opinion that it does not, and
therefore I reäfirrm the veto.

You guys are welcome to try to /convince/ me to change my opinion, or to
have the veto invalidated.  In either case, you will be more likely to
succeed should your arguments relate not only to the veto's implications
but also to its /sine qua non/ component: its rationale.

Before I salutate this post, I wish to point out that it's rather
ironic — or perhaps I should say /alarming/ — that the request for veto
withdrawal does not deal with the substantive grounds for the veto,
considering those grounds were "dev@ isn't being listened to".  In fact,
this is so inconsistent with the past 15+ years of kfogel interactions
that I feel I should ask whoever happens to live closest to kfogel's if
they would be so very kind as to pop over there, knock on the front
door, and tell him his email is being impersonated.  (Naturally, make
sure it's actually him at the door, first. :P)

Cheers,

Daniel

P.S.  Could that BRANCH-README please state what's the problem the branch
means to solve, i.e., the goal / acceptance test?  "Make it possible to
«svn add» SHA-1 collisions"?

> Evgeny's work is on this branch...
> 
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt
> 
> ...which in turn branched from
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind.
> 
> I used this command to get an overview of the work:
> 
> $ svn cat https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt/BRANCH-README
> 
> (The work is several months old now, but for the sake of discussion let's
> assume it's mergeable, passes all tests, etc. Obviously, Evgeny's only going
> to merge it when all of those conditions are true -- maybe some minor tweaks
> will be needed to get it there, I don't know.)


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

Posted by Karl Fogel <kf...@red-bean.com>.
On 04 Jan 2024, Daniel Shahaf wrote:
>Acknowledging receipt.  I'll reply substantively when I have the 
>time to swap in the context.

Thanks.  Yeah, I went through the same context-swapping-in process 
yesterday before posting!

Best regards,
-Karl

>> Evgeny's work is on this branch...
>>
>> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt
>>
>> ...which in turn branched from 
>> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind.
>>
>> I used this command to get an overview of the work:
>>
>> $ svn cat 
>> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt/BRANCH-README
>>
>> (The work is several months old now, but for the sake of 
>> discussion let's assume it's mergeable, passes all tests, etc. 
>> Obviously, Evgeny's only going to merge it when all of those 
>> conditions are true -- maybe some minor tweaks will be needed 
>> to 
>> get it there, I don't know.)
>>
>> Best regards,
>> -Karl

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Karl Fogel wrote on Wed, 03 Jan 2024 22:13 +00:00:
> On 01 Apr 2023, Evgeny Kotkov via dev wrote:
>>Daniel Shahaf <d....@daniel.shahaf.name> writes:
>>
>>> What's the question or action item to/for me?  Thanks.
>>
>>I'm afraid I don't fully understand your question.  As you
>>probably remember, the change is blocked by your veto.  To my
>>knowledge, this veto hasn't been revoked as of now, and I simply
>>mentioned that in my email.  It is entirely your decision
>>whether or not to take any action regarding this matter.
>
> So AIUI, Evgeny is asking you to withdraw your veto, Daniel. 
> Evgeny would like to merge this into trunk -- on the grounds, I 
> believe, that it is strictly an improvement over what we have now, 
> and it opens the door to further future improvements (each of 
> which would go through the usual discussion & consensus process, 
> of course).
>

Acknowledging receipt.  I'll reply substantively when I have the time to swap in the context.

Daniel

> Evgeny's work is on this branch...
>
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt
>
> ...which in turn branched from 
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind.
>
> I used this command to get an overview of the work:
>
> $ svn cat 
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt/BRANCH-README
>
> (The work is several months old now, but for the sake of 
> discussion let's assume it's mergeable, passes all tests, etc. 
> Obviously, Evgeny's only going to merge it when all of those 
> conditions are true -- maybe some minor tweaks will be needed to 
> get it there, I don't know.)
>
> Best regards,
> -Karl

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

Posted by Karl Fogel <kf...@red-bean.com>.
On 01 Apr 2023, Evgeny Kotkov via dev wrote:
>Daniel Shahaf <d....@daniel.shahaf.name> writes:
>
>> What's the question or action item to/for me?  Thanks.
>
>I'm afraid I don't fully understand your question.  As you
>probably remember, the change is blocked by your veto.  To my
>knowledge, this veto hasn't been revoked as of now, and I simply
>mentioned that in my email.  It is entirely your decision
>whether or not to take any action regarding this matter.

So AIUI, Evgeny is asking you to withdraw your veto, Daniel. 
Evgeny would like to merge this into trunk -- on the grounds, I 
believe, that it is strictly an improvement over what we have now, 
and it opens the door to further future improvements (each of 
which would go through the usual discussion & consensus process, 
of course).

Evgeny's work is on this branch...

https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt

...which in turn branched from 
https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind.

I used this command to get an overview of the work:

$ svn cat 
https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt/BRANCH-README

(The work is several months old now, but for the sake of 
discussion let's assume it's mergeable, passes all tests, etc. 
Obviously, Evgeny's only going to merge it when all of those 
conditions are true -- maybe some minor tweaks will be needed to 
get it there, I don't know.)

Best regards,
-Karl

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

Posted by Evgeny Kotkov via dev <de...@subversion.apache.org>.
Daniel Shahaf <d....@daniel.shahaf.name> writes:

> What's the question or action item to/for me?  Thanks.

I'm afraid I don't fully understand your question.  As you probably remember,
the change is blocked by your veto.  To my knowledge, this veto hasn't been
revoked as of now, and I simply mentioned that in my email.  It is entirely
your decision whether or not to take any action regarding this matter.


Thanks,
Evgeny Kotkov

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Evgeny Kotkov via dev wrote on Wed, 22 Mar 2023 15:23 +00:00:
> This change is still being blocked by a veto, but if danielsh changes his
> mind and if there won't be other objections, I'm ready to complete the few
> remaining bits and merge it to trunk.

What's the question or action item to/for me?  Thanks.

Daniel