You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by st...@apache.org on 2017/05/09 19:07:09 UTC

svn commit: r1794632 - /subversion/trunk/notes/sha1-advisory.txt

Author: stsp
Date: Tue May  9 19:07:09 2017
New Revision: 1794632

URL: http://svn.apache.org/viewvc?rev=1794632&view=rev
Log:
* notes/sha1-advisory.txt: wording tweak

Modified:
    subversion/trunk/notes/sha1-advisory.txt

Modified: subversion/trunk/notes/sha1-advisory.txt
URL: http://svn.apache.org/viewvc/subversion/trunk/notes/sha1-advisory.txt?rev=1794632&r1=1794631&r2=1794632&view=diff
==============================================================================
--- subversion/trunk/notes/sha1-advisory.txt (original)
+++ subversion/trunk/notes/sha1-advisory.txt Tue May  9 19:07:09 2017
@@ -27,8 +27,8 @@ Details:
 
   If both of these files are committed to a Subversion repository, Subversion
   de-duplicates content based on the SHA1 checksum and only the content of
-  one of the files ends up being stored in the repository. However, the meta
-  data stores the MD5 checksums of both files, and these MD5 checksums differ.
+  one of the files ends up being stored in the repository. However, meta data
+  stores the MD5 checksums of both files, and these MD5 checksums differ.
   This causes problems when Subversion eventually uses the MD5 checksum of
   the content which was not in fact stored. For example, updates and commits
   may no longer be possible due to an apparent checksum error.



Re: svn commit: r1794632 - /subversion/trunk/notes/sha1-advisory.txt

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Stefan Sperling wrote on Wed, May 10, 2017 at 13:34:07 +0200:
> On Wed, May 10, 2017 at 09:11:50AM +0000, Daniel Shahaf wrote:
> > > Details:
> > > ========
> > > 
> > >   In February 2017 a group of researchers released two PDF files which have
> > >   different content but produce the same SHA1 checksum. This was the first
> > >   publicly known SHA1 collision ever produced.
> > > 
> > >   If both of these files are committed to a Subversion repository, Subversion
> > >   de-duplicates content based on the SHA1 checksum and only the content of
> > 
> > Missing qualifiers: only for FSFS and FSX and only if rep-sharing is
> > enabled.  (I see the "Recommendations" section says that, but I think
> > they belong here.)
> 
> Ack, fixed.

You've added the "only FSFS and FSX" part but not the "only if
rep-sharing is enabled" part.  Was that intentional?

(I see the "Recommendations" section mentions enabling rep-sharing
post-upgrade, but that's separate to stating that repositories that have
rep-sharing disabled _today_ are not vulnerable.)

> > Moreover, the error is not spurious; on the contrary: it functions
> > exactly as designed, and prevents the wrong file from being used.  Let's
> > say this in the advisory?
> 
> Too much detail, I'd say.

My intention with this text was to avoid readers forming the (mistaken)
impression that the md5 checksum error is a bug and concluding that
libsvn is brittle.  The md5 error is a safety net functioning as
designed.  I know this; you know this; the reader might not.

> > >   introduced the problem. This dump file can be loaded into a new repository.
> > >   If there were more commits after the problematic revision then dump and load
> > >   all of these subsequent revisions as well.
> > 
> > Mention 'svndumpfilter exclude'?
> 
> That may be a third possibility. But has anyone tested it?

I just did:

[[[
% svnlook tree r
/
 shattered-1.pdf
 shattered-2.pdf
% svnadmin create r2 
% svnadmin dump -q r | svndumpfilter exclude /shattered-2.pdf | svnadmin load r2  
Excluding prefixes:
   '/shattered-2.pdf'

Revision 0 committed as 0.
Revision 1 committed as 1.
<<< Started new transaction, based on original revision 1

Dropped 1 node:
   '/shattered-2.pdf'

     * editing path : shattered-1.pdf ... done.

------- Committed revision 1 >>>

% svn cat file://$PWD/r2/shattered-1.pdf | md5sum
ee4aa52b139d925f8d8884402b0a750c  -
% svn cat file://$PWD/r/shattered-1.pdf | md5sum
ee4aa52b139d925f8d8884402b0a750c  -
]]]

Thanks for the fixes!

Cheers,

Daniel


Re: svn commit: r1794632 - /subversion/trunk/notes/sha1-advisory.txt

Posted by Stefan Sperling <st...@elego.de>.
On Thu, May 11, 2017 at 05:49:13AM +0000, Daniel Shahaf wrote:
> Daniel Shahaf wrote on Thu, May 11, 2017 at 05:20:53 +0000:
> > Johan Corveleyn wrote on Thu, May 11, 2017 at 01:34:18 +0200:
> > > Maybe something like this?
> > > 
> > > "Subversion repositories can be broken, becoming partly inaccessible,
> > > by committing two files which have different content, yet produce the
> > > same SHA1 checksum. There is no data loss, but parts of the repository
> > > can no longer be checked out or committed into."
> > 
> > Well, there _is_ data loss, so:
> > 
> >     Subversion fails to store a file that has the same sha1 as another
> >     file in the repository.  Attempts to retrieve the first file would
> >     fail with a checksum error (from the md5 checksum that we also use),
> >     however, if the two files had not only equal sha1's but also equal md5's,
> >     then the wrong content would silently be returned.
> > 
> > Plus a blurb about how that's not going to ever happen by accident.
> 
> Oops, that was a suggested Details section, but we're talking about the
> Summary section.  Pretend I suggested:
> 
>     Subversion repositories, in the default configuration, fail to store
>     a file that has the same SHA-1 checksum as another file.

Please just go ahead and commit any changes you want to make to the file.
That would be a bit easier to follow and review.

We still have some time before we need a final version we all agree on.

Re: svn commit: r1794632 - /subversion/trunk/notes/sha1-advisory.txt

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Shahaf wrote on Thu, May 11, 2017 at 05:20:53 +0000:
> Johan Corveleyn wrote on Thu, May 11, 2017 at 01:34:18 +0200:
> > Maybe something like this?
> > 
> > "Subversion repositories can be broken, becoming partly inaccessible,
> > by committing two files which have different content, yet produce the
> > same SHA1 checksum. There is no data loss, but parts of the repository
> > can no longer be checked out or committed into."
> 
> Well, there _is_ data loss, so:
> 
>     Subversion fails to store a file that has the same sha1 as another
>     file in the repository.  Attempts to retrieve the first file would
>     fail with a checksum error (from the md5 checksum that we also use),
>     however, if the two files had not only equal sha1's but also equal md5's,
>     then the wrong content would silently be returned.
> 
> Plus a blurb about how that's not going to ever happen by accident.

Oops, that was a suggested Details section, but we're talking about the
Summary section.  Pretend I suggested:

    Subversion repositories, in the default configuration, fail to store
    a file that has the same SHA-1 checksum as another file.

Re: svn commit: r1794632 - /subversion/trunk/notes/sha1-advisory.txt

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Johan Corveleyn wrote on Thu, May 11, 2017 at 01:34:18 +0200:
> On Wed, May 10, 2017 at 1:44 PM, Bert Huijben <be...@qqmail.nl> wrote:
> >> -----Original Message-----
> >> From: Stefan Sperling [mailto:stsp@apache.org]
> >> Sent: woensdag 10 mei 2017 13:34
> >> To: Daniel Shahaf <d....@daniel.shahaf.name>
> >> Cc: dev@subversion.apache.org; commits@subversion.apache.org
> >> Subject: Re: svn commit: r1794632 - /subversion/trunk/notes/sha1-
> >> advisory.txt
> >>
> >> On Wed, May 10, 2017 at 09:11:50AM +0000, Daniel Shahaf wrote:
> >> > > Summary:
> >> > > ========
> >> > >
> >> > >   Subversion repositories can be corrupted by committing two files
> >> > >   which have different content, yet produce the same SHA1 checksum.
> >> >
> >> > I don't think we should call this "corruption": the on-disk data
> >> > structures are intact, both syntactically and semantically.
> 
> Except the committed collision files, I guess. IIUC only one of both
> contents is stored, right? But okay, apart from those sha1-colliding
> contents themselves, the rest is in principle intact (but rendered
> partially inaccessible).

Right: only one of the two files is stored.  (After 'svn import
shattered[12].pdf', there is only one ENDREP in the revision file.)

So, strictly speaking, this is a data loss for users who deliberately
commit sha1 collisions into their repositories.  We've already
determined that there are two such use-cases: the webkit use-case — who
can just re-download shattered1.pdf — and researchers, who ought to have
disabled rep-sharing in the first place.  That is: it's not a very
severe issue for the average user.

> >> > The problem
> >> > is in the libraries' assumption that sha1 has no collisions.
> >> >
> >> > I'm afraid I don't have a good suggestion; perhaps "Distinct files that
> >> > have equal sha1 checksums cannot be checked out"?
> >>
> >> I think we should call it corruption simply because it looks like
> >> that to our users when it happens (see webkit).
> >>
> >> This is a user-facing text. We want users to take action and upgrade so
> >> they won't run into the problem. The purpose of this text is to raise
> >> awareness. It is not to communicate technical details of the problem,
> >> which can be obtained by other means (reading code, mailing lists, etc.)
> >>
> >> I expect "corruption" will turn on people's alarm bells more than your
> >> suggested wording which is very exact but also sounds less dramatic.
> >
> > Those alarm bells are the reason why I wouldn't call it corruption, as that
> > part will probably be highlighted in the media, while there is nothing
> > corrupt on disk.
> 
> Hm, I think Bert has a point. We should find the middle ground between
> making it dramatic enough (to attract attention and strong motivation
> to upgrade), but not causing panic and making this the only quote that
> will appear in the media.
> 
> Maybe something like this?
> 
> "Subversion repositories can be broken, becoming partly inaccessible,
> by committing two files which have different content, yet produce the
> same SHA1 checksum. There is no data loss, but parts of the repository
> can no longer be checked out or committed into."

Well, there _is_ data loss, so:

    Subversion fails to store a file that has the same sha1 as another
    file in the repository.  Attempts to retrieve the first file would
    fail with a checksum error (from the md5 checksum that we also use),
    however, if the two files had not only equal sha1's but also equal md5's,
    then the wrong content would silently be returned.

Plus a blurb about how that's not going to ever happen by accident.

> Separately: maybe we should include a reference to the pre-commit hook
> script, for completeness of the possible ways to protect oneself. I
> think it will be clear that upgrading to 1.9.6 will be the better fix,
> but that the pre-commit hook might buy you some time if you can't
> upgrade immediately.

+1


Re: svn commit: r1794632 - /subversion/trunk/notes/sha1-advisory.txt

Posted by Johan Corveleyn <jc...@gmail.com>.
On Wed, May 10, 2017 at 1:44 PM, Bert Huijben <be...@qqmail.nl> wrote:
>> -----Original Message-----
>> From: Stefan Sperling [mailto:stsp@apache.org]
>> Sent: woensdag 10 mei 2017 13:34
>> To: Daniel Shahaf <d....@daniel.shahaf.name>
>> Cc: dev@subversion.apache.org; commits@subversion.apache.org
>> Subject: Re: svn commit: r1794632 - /subversion/trunk/notes/sha1-
>> advisory.txt
>>
>> On Wed, May 10, 2017 at 09:11:50AM +0000, Daniel Shahaf wrote:
>> > > Summary:
>> > > ========
>> > >
>> > >   Subversion repositories can be corrupted by committing two files
>> > >   which have different content, yet produce the same SHA1 checksum.
>> >
>> > I don't think we should call this "corruption": the on-disk data
>> > structures are intact, both syntactically and semantically.

Except the committed collision files, I guess. IIUC only one of both
contents is stored, right? But okay, apart from those sha1-colliding
contents themselves, the rest is in principle intact (but rendered
partially inaccessible).

>> > The problem
>> > is in the libraries' assumption that sha1 has no collisions.
>> >
>> > I'm afraid I don't have a good suggestion; perhaps "Distinct files that
>> > have equal sha1 checksums cannot be checked out"?
>>
>> I think we should call it corruption simply because it looks like
>> that to our users when it happens (see webkit).
>>
>> This is a user-facing text. We want users to take action and upgrade so
>> they won't run into the problem. The purpose of this text is to raise
>> awareness. It is not to communicate technical details of the problem,
>> which can be obtained by other means (reading code, mailing lists, etc.)
>>
>> I expect "corruption" will turn on people's alarm bells more than your
>> suggested wording which is very exact but also sounds less dramatic.
>
> Those alarm bells are the reason why I wouldn't call it corruption, as that
> part will probably be highlighted in the media, while there is nothing
> corrupt on disk.

Hm, I think Bert has a point. We should find the middle ground between
making it dramatic enough (to attract attention and strong motivation
to upgrade), but not causing panic and making this the only quote that
will appear in the media.

Maybe something like this?

"Subversion repositories can be broken, becoming partly inaccessible,
by committing two files which have different content, yet produce the
same SHA1 checksum. There is no data loss, but parts of the repository
can no longer be checked out or committed into."

Separately: maybe we should include a reference to the pre-commit hook
script, for completeness of the possible ways to protect oneself. I
think it will be clear that upgrading to 1.9.6 will be the better fix,
but that the pre-commit hook might buy you some time if you can't
upgrade immediately.

-- 
Johan

RE: svn commit: r1794632 - /subversion/trunk/notes/sha1-advisory.txt

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Stefan Sperling [mailto:stsp@apache.org]
> Sent: woensdag 10 mei 2017 13:34
> To: Daniel Shahaf <d....@daniel.shahaf.name>
> Cc: dev@subversion.apache.org; commits@subversion.apache.org
> Subject: Re: svn commit: r1794632 - /subversion/trunk/notes/sha1-
> advisory.txt
> 
> On Wed, May 10, 2017 at 09:11:50AM +0000, Daniel Shahaf wrote:
> > > Summary:
> > > ========
> > >
> > >   Subversion repositories can be corrupted by committing two files
> > >   which have different content, yet produce the same SHA1 checksum.
> >
> > I don't think we should call this "corruption": the on-disk data
> > structures are intact, both syntactically and semantically.  The problem
> > is in the libraries' assumption that sha1 has no collisions.
> >
> > I'm afraid I don't have a good suggestion; perhaps "Distinct files that
> > have equal sha1 checksums cannot be checked out"?
> 
> I think we should call it corruption simply because it looks like
> that to our users when it happens (see webkit).
> 
> This is a user-facing text. We want users to take action and upgrade so
> they won't run into the problem. The purpose of this text is to raise
> awareness. It is not to communicate technical details of the problem,
> which can be obtained by other means (reading code, mailing lists, etc.)
> 
> I expect "corruption" will turn on people's alarm bells more than your
> suggested wording which is very exact but also sounds less dramatic.

Those alarm bells are the reason why I wouldn't call it corruption, as that
part will probably be highlighted in the media, while there is nothing
corrupt on disk.

	Bert


Re: svn commit: r1794632 - /subversion/trunk/notes/sha1-advisory.txt

Posted by Stefan Sperling <st...@apache.org>.
On Wed, May 10, 2017 at 09:11:50AM +0000, Daniel Shahaf wrote:
> > Summary:
> > ========
> > 
> >   Subversion repositories can be corrupted by committing two files
> >   which have different content, yet produce the same SHA1 checksum.
> 
> I don't think we should call this "corruption": the on-disk data
> structures are intact, both syntactically and semantically.  The problem
> is in the libraries' assumption that sha1 has no collisions.
> 
> I'm afraid I don't have a good suggestion; perhaps "Distinct files that
> have equal sha1 checksums cannot be checked out"?

I think we should call it corruption simply because it looks like
that to our users when it happens (see webkit).

This is a user-facing text. We want users to take action and upgrade so
they won't run into the problem. The purpose of this text is to raise
awareness. It is not to communicate technical details of the problem,
which can be obtained by other means (reading code, mailing lists, etc.)

I expect "corruption" will turn on people's alarm bells more than your
suggested wording which is very exact but also sounds less dramatic.

> > Details:
> > ========
> > 
> >   In February 2017 a group of researchers released two PDF files which have
> >   different content but produce the same SHA1 checksum. This was the first
> >   publicly known SHA1 collision ever produced.
> > 
> >   If both of these files are committed to a Subversion repository, Subversion
> >   de-duplicates content based on the SHA1 checksum and only the content of
> 
> Missing qualifiers: only for FSFS and FSX and only if rep-sharing is
> enabled.  (I see the "Recommendations" section says that, but I think
> they belong here.)

Ack, fixed.

> >   one of the files ends up being stored in the repository. However, meta data
> >   stores the MD5 checksums of both files, and these MD5 checksums differ.
> >   This causes problems when Subversion eventually uses the MD5 checksum of
> >   the content which was not in fact stored. For example, updates and commits
> >   may no longer be possible due to an apparent checksum error.
> 
> It'd be shorter and clearer to say "may fail with a checksum error".

Yes. I like this much better, too!

> Moreover, the error is not spurious; on the contrary: it functions
> exactly as designed, and prevents the wrong file from being used.  Let's
> say this in the advisory?

Too much detail, I'd say.

> The problem is that in a few weeks, when the 'shattered' exploit code is
> released, it'll be affordable to create files that collide sha1 and md5
> simultaneously, so the md5 checksum error will no longer happen.  At
> that point we may have to revise this advisory.

Well, this section describes what happens with these specific two PDF files.
I think it's fine as it is. Once people have installed our patch they don't
need to worry about future collisions anyway.

And further in the text, we do explain that there is a general reliance on
SHA1 on the client side as well. Smart cookies will infer the rest.

> > Recommendations:
> > ================
> > 
> >   We recommend all users update to Subversion 1.9.6 which will reject any
> >   commit that would create a SHA1 collision.
> (Nitpick: s/update/upgrade/)
> >   Note that this fix only works if the "representation-sharing" feature is
> >   enabled (it is enabled by default). If the file db/fsfs.conf inside the
> >   repository contains 'enable-rep-sharing = false', this option must be
> >   set to 'true' after upgrading to 1.9.6.
> > 
> >   One solution is just to delete the second file. This will resolve this
> >   problem for normal SVN client usage, but it will not work for tools like
> >   svnsync or git-svn which try to replay every revision in the repository.
> >   This will run into an error on the revision where the content was committed
> >   and the tool will not be able to proceed.
> 
> s/This/they/; s/the tool//

Sure.

> 
> >   A second solution would be to remove the problematic revision with svnadmin.
> >   svnadmin dump can be used to dump the repository up to the revision that
> 
> Quote the command name 'svnadmin dump'?

Sure.

> 
> >   introduced the problem. This dump file can be loaded into a new repository.
> >   If there were more commits after the problematic revision then dump and load
> >   all of these subsequent revisions as well.
> 
> Mention 'svndumpfilter exclude'?

That may be a third possibility. But has anyone tested it?

> >   Another option is to create a Subversion permission rule (authz) that blocks
> >   access to the one or both of the files. This will work with tools like
> >   svnsync and git-svn as the server will not send the colliding content.
> 
> I suggest to give an example; people might not realise that it's
> possible to write authz rules for single files (as opposed to
> directories).  E.g.,
> 
>      [/trunk/tests/data/shattered1.pdf]
>      * =
>      [/trunk/tests/data/shattered2.pdf]
>      * =

Indeed, I agree.

I have committed all the aforementioned changes in r1794707.

Thanks!

Re: svn commit: r1794632 - /subversion/trunk/notes/sha1-advisory.txt

Posted by Stefan Sperling <st...@apache.org>.
On Wed, May 10, 2017 at 09:11:50AM +0000, Daniel Shahaf wrote:
> > Summary:
> > ========
> > 
> >   Subversion repositories can be corrupted by committing two files
> >   which have different content, yet produce the same SHA1 checksum.
> 
> I don't think we should call this "corruption": the on-disk data
> structures are intact, both syntactically and semantically.  The problem
> is in the libraries' assumption that sha1 has no collisions.
> 
> I'm afraid I don't have a good suggestion; perhaps "Distinct files that
> have equal sha1 checksums cannot be checked out"?

I think we should call it corruption simply because it looks like
that to our users when it happens (see webkit).

This is a user-facing text. We want users to take action and upgrade so
they won't run into the problem. The purpose of this text is to raise
awareness. It is not to communicate technical details of the problem,
which can be obtained by other means (reading code, mailing lists, etc.)

I expect "corruption" will turn on people's alarm bells more than your
suggested wording which is very exact but also sounds less dramatic.

> > Details:
> > ========
> > 
> >   In February 2017 a group of researchers released two PDF files which have
> >   different content but produce the same SHA1 checksum. This was the first
> >   publicly known SHA1 collision ever produced.
> > 
> >   If both of these files are committed to a Subversion repository, Subversion
> >   de-duplicates content based on the SHA1 checksum and only the content of
> 
> Missing qualifiers: only for FSFS and FSX and only if rep-sharing is
> enabled.  (I see the "Recommendations" section says that, but I think
> they belong here.)

Ack, fixed.

> >   one of the files ends up being stored in the repository. However, meta data
> >   stores the MD5 checksums of both files, and these MD5 checksums differ.
> >   This causes problems when Subversion eventually uses the MD5 checksum of
> >   the content which was not in fact stored. For example, updates and commits
> >   may no longer be possible due to an apparent checksum error.
> 
> It'd be shorter and clearer to say "may fail with a checksum error".

Yes. I like this much better, too!

> Moreover, the error is not spurious; on the contrary: it functions
> exactly as designed, and prevents the wrong file from being used.  Let's
> say this in the advisory?

Too much detail, I'd say.

> The problem is that in a few weeks, when the 'shattered' exploit code is
> released, it'll be affordable to create files that collide sha1 and md5
> simultaneously, so the md5 checksum error will no longer happen.  At
> that point we may have to revise this advisory.

Well, this section describes what happens with these specific two PDF files.
I think it's fine as it is. Once people have installed our patch they don't
need to worry about future collisions anyway.

And further in the text, we do explain that there is a general reliance on
SHA1 on the client side as well. Smart cookies will infer the rest.

> > Recommendations:
> > ================
> > 
> >   We recommend all users update to Subversion 1.9.6 which will reject any
> >   commit that would create a SHA1 collision.
> (Nitpick: s/update/upgrade/)
> >   Note that this fix only works if the "representation-sharing" feature is
> >   enabled (it is enabled by default). If the file db/fsfs.conf inside the
> >   repository contains 'enable-rep-sharing = false', this option must be
> >   set to 'true' after upgrading to 1.9.6.
> > 
> >   One solution is just to delete the second file. This will resolve this
> >   problem for normal SVN client usage, but it will not work for tools like
> >   svnsync or git-svn which try to replay every revision in the repository.
> >   This will run into an error on the revision where the content was committed
> >   and the tool will not be able to proceed.
> 
> s/This/they/; s/the tool//

Sure.

> 
> >   A second solution would be to remove the problematic revision with svnadmin.
> >   svnadmin dump can be used to dump the repository up to the revision that
> 
> Quote the command name 'svnadmin dump'?

Sure.

> 
> >   introduced the problem. This dump file can be loaded into a new repository.
> >   If there were more commits after the problematic revision then dump and load
> >   all of these subsequent revisions as well.
> 
> Mention 'svndumpfilter exclude'?

That may be a third possibility. But has anyone tested it?

> >   Another option is to create a Subversion permission rule (authz) that blocks
> >   access to the one or both of the files. This will work with tools like
> >   svnsync and git-svn as the server will not send the colliding content.
> 
> I suggest to give an example; people might not realise that it's
> possible to write authz rules for single files (as opposed to
> directories).  E.g.,
> 
>      [/trunk/tests/data/shattered1.pdf]
>      * =
>      [/trunk/tests/data/shattered2.pdf]
>      * =

Indeed, I agree.

I have committed all the aforementioned changes in r1794707.

Thanks!

Re: svn commit: r1794632 - /subversion/trunk/notes/sha1-advisory.txt

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
[ Reviewing the whole file as of this revision. ]

stsp@apache.org wrote on Tue, May 09, 2017 at 19:07:09 -0000:
> Author: stsp
> Date: Tue May  9 19:07:09 2017
> New Revision: 1794632
> 
> URL: http://svn.apache.org/viewvc?rev=1794632&view=rev
> Log:
> * notes/sha1-advisory.txt: wording tweak
> 
> Modified:
>     subversion/trunk/notes/sha1-advisory.txt

>   Apache Subversion is unable to store SHA1 collisions
> 
> Summary:
> ========
> 
>   Subversion repositories can be corrupted by committing two files
>   which have different content, yet produce the same SHA1 checksum.

I don't think we should call this "corruption": the on-disk data
structures are intact, both syntactically and semantically.  The problem
is in the libraries' assumption that sha1 has no collisions.

I'm afraid I don't have a good suggestion; perhaps "Distinct files that
have equal sha1 checksums cannot be checked out"?

> Details:
> ========
> 
>   In February 2017 a group of researchers released two PDF files which have
>   different content but produce the same SHA1 checksum. This was the first
>   publicly known SHA1 collision ever produced.
> 
>   If both of these files are committed to a Subversion repository, Subversion
>   de-duplicates content based on the SHA1 checksum and only the content of

Missing qualifiers: only for FSFS and FSX and only if rep-sharing is
enabled.  (I see the "Recommendations" section says that, but I think
they belong here.)

>   one of the files ends up being stored in the repository. However, meta data
>   stores the MD5 checksums of both files, and these MD5 checksums differ.
>   This causes problems when Subversion eventually uses the MD5 checksum of
>   the content which was not in fact stored. For example, updates and commits
>   may no longer be possible due to an apparent checksum error.

It'd be shorter and clearer to say "may fail with a checksum error".

Moreover, the error is not spurious; on the contrary: it functions
exactly as designed, and prevents the wrong file from being used.  Let's
say this in the advisory?

The problem is that in a few weeks, when the 'shattered' exploit code is
released, it'll be affordable to create files that collide sha1 and md5
simultaneously, so the md5 checksum error will no longer happen.  At
that point we may have to revise this advisory.

> Recommendations:
> ================
> 
>   We recommend all users update to Subversion 1.9.6 which will reject any
>   commit that would create a SHA1 collision.
(Nitpick: s/update/upgrade/)
>   Note that this fix only works if the "representation-sharing" feature is
>   enabled (it is enabled by default). If the file db/fsfs.conf inside the
>   repository contains 'enable-rep-sharing = false', this option must be
>   set to 'true' after upgrading to 1.9.6.
> 
>   One solution is just to delete the second file. This will resolve this
>   problem for normal SVN client usage, but it will not work for tools like
>   svnsync or git-svn which try to replay every revision in the repository.
>   This will run into an error on the revision where the content was committed
>   and the tool will not be able to proceed.

s/This/they/; s/the tool//

>   A second solution would be to remove the problematic revision with svnadmin.
>   svnadmin dump can be used to dump the repository up to the revision that

Quote the command name 'svnadmin dump'?

>   introduced the problem. This dump file can be loaded into a new repository.
>   If there were more commits after the problematic revision then dump and load
>   all of these subsequent revisions as well.

Mention 'svndumpfilter exclude'?

>   Another option is to create a Subversion permission rule (authz) that blocks
>   access to the one or both of the files. This will work with tools like
>   svnsync and git-svn as the server will not send the colliding content.

I suggest to give an example; people might not realise that it's
possible to write authz rules for single files (as opposed to
directories).  E.g.,

     [/trunk/tests/data/shattered1.pdf]
     * =
     [/trunk/tests/data/shattered2.pdf]
     * =

Cheers,

Daniel

Re: svn commit: r1794632 - /subversion/trunk/notes/sha1-advisory.txt

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
[ Reviewing the whole file as of this revision. ]

stsp@apache.org wrote on Tue, May 09, 2017 at 19:07:09 -0000:
> Author: stsp
> Date: Tue May  9 19:07:09 2017
> New Revision: 1794632
> 
> URL: http://svn.apache.org/viewvc?rev=1794632&view=rev
> Log:
> * notes/sha1-advisory.txt: wording tweak
> 
> Modified:
>     subversion/trunk/notes/sha1-advisory.txt

>   Apache Subversion is unable to store SHA1 collisions
> 
> Summary:
> ========
> 
>   Subversion repositories can be corrupted by committing two files
>   which have different content, yet produce the same SHA1 checksum.

I don't think we should call this "corruption": the on-disk data
structures are intact, both syntactically and semantically.  The problem
is in the libraries' assumption that sha1 has no collisions.

I'm afraid I don't have a good suggestion; perhaps "Distinct files that
have equal sha1 checksums cannot be checked out"?

> Details:
> ========
> 
>   In February 2017 a group of researchers released two PDF files which have
>   different content but produce the same SHA1 checksum. This was the first
>   publicly known SHA1 collision ever produced.
> 
>   If both of these files are committed to a Subversion repository, Subversion
>   de-duplicates content based on the SHA1 checksum and only the content of

Missing qualifiers: only for FSFS and FSX and only if rep-sharing is
enabled.  (I see the "Recommendations" section says that, but I think
they belong here.)

>   one of the files ends up being stored in the repository. However, meta data
>   stores the MD5 checksums of both files, and these MD5 checksums differ.
>   This causes problems when Subversion eventually uses the MD5 checksum of
>   the content which was not in fact stored. For example, updates and commits
>   may no longer be possible due to an apparent checksum error.

It'd be shorter and clearer to say "may fail with a checksum error".

Moreover, the error is not spurious; on the contrary: it functions
exactly as designed, and prevents the wrong file from being used.  Let's
say this in the advisory?

The problem is that in a few weeks, when the 'shattered' exploit code is
released, it'll be affordable to create files that collide sha1 and md5
simultaneously, so the md5 checksum error will no longer happen.  At
that point we may have to revise this advisory.

> Recommendations:
> ================
> 
>   We recommend all users update to Subversion 1.9.6 which will reject any
>   commit that would create a SHA1 collision.
(Nitpick: s/update/upgrade/)
>   Note that this fix only works if the "representation-sharing" feature is
>   enabled (it is enabled by default). If the file db/fsfs.conf inside the
>   repository contains 'enable-rep-sharing = false', this option must be
>   set to 'true' after upgrading to 1.9.6.
> 
>   One solution is just to delete the second file. This will resolve this
>   problem for normal SVN client usage, but it will not work for tools like
>   svnsync or git-svn which try to replay every revision in the repository.
>   This will run into an error on the revision where the content was committed
>   and the tool will not be able to proceed.

s/This/they/; s/the tool//

>   A second solution would be to remove the problematic revision with svnadmin.
>   svnadmin dump can be used to dump the repository up to the revision that

Quote the command name 'svnadmin dump'?

>   introduced the problem. This dump file can be loaded into a new repository.
>   If there were more commits after the problematic revision then dump and load
>   all of these subsequent revisions as well.

Mention 'svndumpfilter exclude'?

>   Another option is to create a Subversion permission rule (authz) that blocks
>   access to the one or both of the files. This will work with tools like
>   svnsync and git-svn as the server will not send the colliding content.

I suggest to give an example; people might not realise that it's
possible to write authz rules for single files (as opposed to
directories).  E.g.,

     [/trunk/tests/data/shattered1.pdf]
     * =
     [/trunk/tests/data/shattered2.pdf]
     * =

Cheers,

Daniel