You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Paul Hirst <pa...@sophos.com> on 2010/09/23 09:44:43 UTC

MD5 collisions

Hi,

There was a previous thread about exposing the MD5 of attachments and
this got me thinking.

Since MD5 is 'broken' (ie two different files can be generated with the
same MD5 hash) I have a few of questions.

      * Does this actually break couchdb? Ie would it be impossible to
        upload two different attachments with the same MD5?
              * To the same document?
              * To different documents?
      * Are there any other implications? Would replication get
        confused?
      * Has anyone considered switching to a stronger checksum?

This isn't just a theoretical problem to me. I would genuinely like to
store two files in couchdb which have the same MD5.




Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

Re: MD5 collisions

Posted by Paul Hirst <pa...@sophos.com>.
On Thu, 2010-09-23 at 09:24 +0100, Sebastian Cohnen wrote:
> Still I don't see a reason for that situation to become a real problem:
>
> >      * Does this actually break couchdb? Ie would it be impossible to
> >        upload two different attachments with the same MD5?
> >              * To the same document?
>
> I'm not sure on this one, but since attachments are "namespaced" by their name, which have to be unique per document, I don't see a problem for them having the same MD5 sum.
>
> >              * To different documents?
>
> Different documents are "namespaced" by their ID, so no problem here.
>
> >      * Are there any other implications? Would replication get
> >        confused?
>
> AFAIK replication does only take a look at _id and _rev.

Thanks a lot, that all sounds like excellent news.


Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

Re: MD5 collisions

Posted by Sebastian Cohnen <se...@googlemail.com>.
Still I don't see a reason for that situation to become a real problem:

>      * Does this actually break couchdb? Ie would it be impossible to
>        upload two different attachments with the same MD5?
>              * To the same document?

I'm not sure on this one, but since attachments are "namespaced" by their name, which have to be unique per document, I don't see a problem for them having the same MD5 sum.

>              * To different documents?

Different documents are "namespaced" by their ID, so no problem here.

>      * Are there any other implications? Would replication get
>        confused?

AFAIK replication does only take a look at _id and _rev.



On 23.09.2010, at 10:13, Paul Hirst wrote:

> On Thu, 2010-09-23 at 08:54 +0100, Sebastian Cohnen wrote:
>> The collision probability is quite low. MD5 is considered to b broken
>> from a cryptographical point of view - an attacker can craft a file
>> that has the exact same hash of another one. I would doubt that you
>> are going to encounter a collision in practice on "normal" usage.
> 
> I really do have two crafted files with the same MD5 that I'd like to
> store in CouchDB. They are proof of concept Windows executables and they
> just happen to live in the set of files I'd like to store in Couch. It's
> just 2 out of many millions of files but I'd really value an opinion on
> if anything will break and in what way.
> 
> I'll admit, this is an unusual use case.
> 
> I want to use CouchDB to store files and metadata about files relating
> to vulnerabilities, exploits, malware, etc. I could decide to throw away
> these proof of concept files because they aren't actually that
> interesting but there is a good chance the database I want to build
> would see more of them in future.
> 
> Obviously, under normal usage this sort of thing would never be a
> problem.
> 
> 
> Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
> Company Reg No 2096520. VAT Reg No GB 348 3873 20.


Re: MD5 collisions

Posted by Paul Hirst <pa...@sophos.com>.
On Thu, 2010-09-23 at 08:54 +0100, Sebastian Cohnen wrote:
> The collision probability is quite low. MD5 is considered to b broken
>  from a cryptographical point of view - an attacker can craft a file
>  that has the exact same hash of another one. I would doubt that you
>  are going to encounter a collision in practice on "normal" usage.

I really do have two crafted files with the same MD5 that I'd like to
store in CouchDB. They are proof of concept Windows executables and they
just happen to live in the set of files I'd like to store in Couch. It's
just 2 out of many millions of files but I'd really value an opinion on
if anything will break and in what way.

I'll admit, this is an unusual use case.

I want to use CouchDB to store files and metadata about files relating
to vulnerabilities, exploits, malware, etc. I could decide to throw away
these proof of concept files because they aren't actually that
interesting but there is a good chance the database I want to build
would see more of them in future.

Obviously, under normal usage this sort of thing would never be a
problem.


Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

Re: MD5 collisions

Posted by Sebastian Cohnen <se...@googlemail.com>.
The collision probability is quite low. MD5 is considered to b broken from a cryptographical point of view - an attacker can craft a file that has the exact same hash of another one. I would doubt that you are going to encounter a collision in practice on "normal" usage.

So no, I would not consider the usage of MD5 to break CouchDB.

On 23.09.2010, at 09:44, Paul Hirst wrote:

> Hi,
> 
> There was a previous thread about exposing the MD5 of attachments and
> this got me thinking.
> 
> Since MD5 is 'broken' (ie two different files can be generated with the
> same MD5 hash) I have a few of questions.
> 
>      * Does this actually break couchdb? Ie would it be impossible to
>        upload two different attachments with the same MD5?
>              * To the same document?
>              * To different documents?
>      * Are there any other implications? Would replication get
>        confused?
>      * Has anyone considered switching to a stronger checksum?
> 
> This isn't just a theoretical problem to me. I would genuinely like to
> store two files in couchdb which have the same MD5.
> 
> 
> 
> 
> Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
> Company Reg No 2096520. VAT Reg No GB 348 3873 20.