You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Paul Hammant <pa...@hammant.org> on 2019/05/15 06:20:25 UTC

SHA1 collisions became cheaper to create.

Article: https://www.zdnet.com/article/sha-1-collision-attacks-are-now-actually-practical-and-a-looming-danger/

Subversion makes a SHA1 hash for each resource held. It is certainly
available as part of the detail for a file/resource, but I don't know
to what extend the PUT logic relies on it.

The ZDNet article talks of better algorithms, but perhaps isn't an
authority on which one is best. I wonder if a pluggable design would
work. Separately a mechanism for the server to reject a Subversion
client as too old may be needed.

- Paul

Re: SHA1 collisions became cheaper to create.

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Stefan Sperling wrote on Wed, May 15, 2019 at 07:44:41 -0400:
> On Wed, May 15, 2019 at 07:20:25AM +0100, Paul Hammant wrote:
> > Article: https://www.zdnet.com/article/sha-1-collision-attacks-are-now-actually-practical-and-a-looming-danger/
> > 
> > Subversion makes a SHA1 hash for each resource held. It is certainly
> > available as part of the detail for a file/resource, but I don't know
> > to what extend the PUT logic relies on it.
> > 
> > The ZDNet article talks of better algorithms, but perhaps isn't an
> > authority on which one is best. I wonder if a pluggable design would
> > work. Separately a mechanism for the server to reject a Subversion
> > client as too old may be needed.
> > 
> > - Paul
> 
> I don't see a way to break repositories with SHA1 collisions in current
> versions of SVN.
> 
> Duplicate content is being rejected on the server ever since the first
> SHA1 collision was found. And this of course happens regardless of which
> particular content produced a collision, and how easy it is for researchers
> to find this colliding content. Also, storing any two distinct contents with
> the same SHA1 checksum has been explicitly declared out of scope for SVN.
> 
> So I see no new consequences for SVN from this development.
> Am I missing something?

There are these two scripts:

tools/hook-scripts/reject-detected-sha1-collisions.sh
tools/hook-scripts/reject-known-sha1-collisions.sh

Collisions created by the new attack will not be recognized by the
second script and might or might not be recognized by the first script.

For future reference, Subversion does not depend on either of these two
scripts for correct operation.  The fixes stsp and I referred to are
implemented directly the server-side logic, in C.

Cheers,

Daniel

Re: SHA1 collisions became cheaper to create.

Posted by Branko Čibej <br...@apache.org>.
On 22.05.2019 01:11, Paul Hammant wrote:
>
>
> Why a merkle tree? One of Subversion's strengths is its linear
> revision history. You could use blockchain and get financial strength
> audit ability.
>
> Blockchain is so over sold. There's a whole class of application where
> the bastard child of Git and Subversion would be a perfect
> non-repudiable and tamper evident store. Ones where distributed
> consensus that's vulnerable to 51% attacks isn't actually needed. 

+1, good one :D


Re: SHA1 collisions became cheaper to create.

Posted by Paul Hammant <pa...@hammant.org>.
Why a merkle tree? One of Subversion's strengths is its linear revision
history. You could use blockchain and get financial strength audit ability.

Blockchain is so over sold. There's a whole class of application where the
bastard child of Git and Subversion would be a perfect non-repudiable and
tamper evident store. Ones where distributed consensus that's vulnerable to
51% attacks isn't actually needed.


>

Re: SHA1 collisions became cheaper to create.

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Nathan Hartman wrote on Tue, 21 May 2019 13:32 +00:00:
> Why a merkle tree? One of Subversion's strengths is its linear revision 
> history. You could use blockchain and get financial strength audit 
> ability.

Handwave, but doesn't that basically boil down to including not
only a node-rev-id but also a checksum of that node-rev's header,
both in "pred:" lines in node-rev headers and in "V" lines in
directory reps?

Re: SHA1 collisions became cheaper to create.

Posted by Nathan Hartman <ha...@gmail.com>.
On Tue, May 21, 2019 at 1:06 AM Paul Hammant <pa...@hammant.org> wrote:

> The Git folks moved to a hardened SHA1 function as an interim measure
> on the way to SHA-256 -
>
> https://github.com/git/git/blob/master/Documentation/technical/hash-function-transition.txt
>
> I think you're generally right. While I might think that an auditor
> would simply be advised of the root hash for a Merkle tree that for a
> branch at a moment in time, or a tag, Subversion doesn't have a a
> Merkle tree under the hood. I coded something niche to retrofit
> Subversion with that, but it's not core and far from perfect as it
> relies on an LRU cache and keeps no history itself. Git's merkle tree
> would be perfect if it didn't blow up when repos get too big, and if
> allowed clone from nodes other than root (branches and tags are in
> respect of root of course).  So, ignore me here


Why a merkle tree? One of Subversion's strengths is its linear revision
history. You could use blockchain and get financial strength audit ability.

Re: SHA1 collisions became cheaper to create.

Posted by Paul Hammant <pa...@hammant.org>.
The Git folks moved to a hardened SHA1 function as an interim measure
on the way to SHA-256 -
https://github.com/git/git/blob/master/Documentation/technical/hash-function-transition.txt

I think you're generally right. While I might think that an auditor
would simply be advised of the root hash for a Merkle tree that for a
branch at a moment in time, or a tag, Subversion doesn't have a a
Merkle tree under the hood. I coded something niche to retrofit
Subversion with that, but it's not core and far from perfect as it
relies on an LRU cache and keeps no history itself. Git's merkle tree
would be perfect if it didn't blow up when repos get too big, and if
allowed clone from nodes other than root (branches and tags are in
respect of root of course).  So, ignore me here.

Re: SHA1 collisions became cheaper to create.

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Paul Hammant wrote on Wed, 15 May 2019 13:03 +00:00:
> Problem I'm trying to solve: In an audit situation where prior commits
> were to be analyzed the owner of the repo that was motivated enough
> could tell the auditor that black was while in respect of a certain
> historical commit. Assuming the auditor had prior SHA1s (in lieu of a
> full Merkle tree), for the resources at a historical revision under
> audit.

Agreed: sha1 is not suitable for that auditor's use-case.  So what?
Nothing requires the auditor to use the same checksum algorithm that
Subversion uses internally.  The auditor should use a stronger hash
algorithm, even if 'svn info' doesn't provide it cheaply.

If you'd like to argue that we should provide blake2 checksums cheaply
(like 'svn info' provides the md5/sha1), that'd be one thing, but I
think you're arguing that we should migrate off sha1 as the primary key
of rep-cache and pristine stores, and I don't see a reason to do that.

Ultimately, in Subversion, if you can change a fulltext, you can also
change the metadata that records the checksum of that fulltext.  That's
true both in the client and in the server.  Therefore, even if we
replaced sha1 with crc32 throughout the code, all that would happen is
that that commits failing with "Server rejected the file because it
collides with some other file"[1] would become *way* more common; there
wouldn't be an attack vector.

What am I missing?

Cheers,

Daniel

[1] Assuming the committer used 'svnmucc put', since otherwise 'svn add'
    might fail first with a similar error.

> Granted PDF payloads (& other large encoded-stream binaries like
> movies) are susceptible for such retroactive fakery, whereas
> CR-delimited text files with plausible content are not retroactively
> fake-able without that being clear to the eye: "Hey, that's not a C
> source file".
> 
> Feel free to ignore - this can wait a number of years.
>

Re: SHA1 collisions became cheaper to create.

Posted by Paul Hammant <pa...@hammant.org>.
Yes, Subversion would remain good a keeping versions of honest development work.

Problem I'm trying to solve: In an audit situation where prior commits
were to be analyzed the owner of the repo that was motivated enough
could tell the auditor that black was while in respect of a certain
historical commit. Assuming the auditor had prior SHA1s (in lieu of a
full Merkle tree), for the resources at a historical revision under
audit.

Granted PDF payloads (& other large encoded-stream binaries like
movies) are susceptible for such retroactive fakery, whereas
CR-delimited text files with plausible content are not retroactively
fake-able without that being clear to the eye: "Hey, that's not a C
source file".

Feel free to ignore - this can wait a number of years.

Re: SHA1 collisions became cheaper to create.

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Paul Hammant wrote on Wed, 15 May 2019 12:39 +00:00:
> I'm suggesting phasing out SHA1, and during a v1.x to v1.x+1 upgrade
> do a migration script for all content to gain (say) BLAKE2 hashes
> *instead*, and for that install, client's with incompatible hashing
> are rejected.
> 
> There are alternates too, where up to a moment in time a repo has
> SHA1s, and thence after has some other algo.

Hold your horses.  *Why* are you proposing to phase out sha1?

For example, is it out of general concerns that a cheap preimage attack
will be discovered before long?  Or do you see a specific way to use the
new attack against working copies or repositories?  Or something else?

Once we've established that, we can discuss *what* to do... but you're
getting ahead of yourself by discussing *how* to phase off sha1 before
we understand *that* (arguendo) that's the right course of action.

Cheers,

Daniel

Re: SHA1 collisions became cheaper to create.

Posted by Paul Hammant <pa...@hammant.org>.
I'm suggesting phasing out SHA1, and during a v1.x to v1.x+1 upgrade
do a migration script for all content to gain (say) BLAKE2 hashes
*instead*, and for that install, client's with incompatible hashing
are rejected.

There are alternates too, where up to a moment in time a repo has
SHA1s, and thence after has some other algo.

Re: SHA1 collisions became cheaper to create.

Posted by Stefan Sperling <st...@elego.de>.
On Wed, May 15, 2019 at 07:20:25AM +0100, Paul Hammant wrote:
> Article: https://www.zdnet.com/article/sha-1-collision-attacks-are-now-actually-practical-and-a-looming-danger/
> 
> Subversion makes a SHA1 hash for each resource held. It is certainly
> available as part of the detail for a file/resource, but I don't know
> to what extend the PUT logic relies on it.
> 
> The ZDNet article talks of better algorithms, but perhaps isn't an
> authority on which one is best. I wonder if a pluggable design would
> work. Separately a mechanism for the server to reject a Subversion
> client as too old may be needed.
> 
> - Paul

I don't see a way to break repositories with SHA1 collisions in current
versions of SVN.

Duplicate content is being rejected on the server ever since the first
SHA1 collision was found. And this of course happens regardless of which
particular content produced a collision, and how easy it is for researchers
to find this colliding content. Also, storing any two distinct contents with
the same SHA1 checksum has been explicitly declared out of scope for SVN.

So I see no new consequences for SVN from this development.
Am I missing something?