You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@depot.apache.org by "Markus M. May" <mm...@gmx.net> on 2004/02/11 08:46:50 UTC

MD5 Hash

Hello,
I just browsed a little around and found some special solutions for the
checksum stuff with MD5-Hashes. ANT has already a nice task for this, which I did
not knew. Anyway, what do you think, should we use this? This means
basically a tight integration with ant. 

Any comments on this one? 

R,

Markus

----------------------------------------------------
To think without knowing makes the coincidence the ruler...

----------------------------------------------------


Re: MD5 Hash

Posted by "Markus M. May" <mm...@gmx.net>.
I think in the same direction. First I will try to compare the generated
hash with the hash from the mirror. In a second step I will then try to
determine the original .md5 file and compare to this one.
Basically the web-of-trust is pretty hard to automate right now. You already
have a KEYS file with quite a lot of keys, but you cannot tell which key
signed the file. There is no way to do this (or i missed it). So right now, I
will concentrate on the MD5-stuff. 

Markus

> > Basically the MD5 Hash does not need keys.
> > [...]
> > Also apache.org delivers a file named .asc
> 
> Ok, thanks, I get it now (I think.)
> 
> This explains some of the negative comments I've heard about MD5 then (it
> not being too strong). I read on some, on one Apache list, that folks will
> be ok with this being strong enough though. What will be tricky for us,
> should we chose to attempt it, will be supporting mirrors yet using the
> original MD5 from Apache...
> 
> Since ASC has keys, that ties in to the 'web of trust' that Apache is
> working on, I think. Once on trusts a certain set of keys, those keys can
> be
> used to verify others that are acquired, and those can be used to verify
> the
> ASC. This is much harder to automate, but something we could aspire to...
> 
> regards,
> 
> Adam
> 


Re: MD5 Hash

Posted by "Adam R. B. Jack" <aj...@trysybase.com>.
> Basically the MD5 Hash does not need keys.
> [...]
> Also apache.org delivers a file named .asc

Ok, thanks, I get it now (I think.)

This explains some of the negative comments I've heard about MD5 then (it
not being too strong). I read on some, on one Apache list, that folks will
be ok with this being strong enough though. What will be tricky for us,
should we chose to attempt it, will be supporting mirrors yet using the
original MD5 from Apache...

Since ASC has keys, that ties in to the 'web of trust' that Apache is
working on, I think. Once on trusts a certain set of keys, those keys can be
used to verify others that are acquired, and those can be used to verify the
ASC. This is much harder to automate, but something we could aspire to...

regards,

Adam


Re: MD5 Hash

Posted by "Markus M. May" <mm...@gmx.net>.
Hello once again,

> > yes, I can enlighten you all a littel bit about MD5 hashs. The basic is
> that
> 
> Ok, thanks for that. I get the gist, I get the premis. Now, more
> practically...
> 
> What are the inputs to the algorythm? Meaning, we have the file, we have
> the
> MD5 resultant hash (assuming the file on the server has not been
> modified),
> and we have the algorythm, but do we need anything else (e.g. keys) in
> order
> to re-compute/check the resultant hash?

Basically the MD5 Hash does not need keys. It is generated from the file
itself without any password or something like that. The code is just a hashcode
of the file (a hex-Number).
> 
> Hmm, what makes folk think that the file could be changed without the MD5
> hash file being changed also. I feel there has to be some private key from
> the originator, to ensure that nobody could fake both.
> 
Like stated earlier, there are no keys there. Since a normal user uses a
mirror to download apache.org sources or binaries, you can then check if the
file has the same hash-code as the original file from apache.org (can be checked
by using the original .md5-file from apache).
Also apache.org delivers a file named .asc (at least some projects, like ant
do this). In this file there is a signiture for the original file. This can
be checked then by using the public key stored in the root-directory of each
project in the KEYS-file. But this has nothing really to do with the MD5
stuff. MD5 just ensures integity basically during the download, but does not,
like you said, ensures, that the file is really the one, which was published or
intended to be published.

> So, if there are such keys, how do we acquire them? How do we trust them?
> 
> regards
> 
> Adam
> 


R,

Markus


Re: MD5 and Mirrors ( was Re: MD5 Hash )

Posted by "Adam R. B. Jack" <aj...@trysybase.com>.
> Working together, I believe both Depot, Repository, Maven, ... can come 
> to a common agreement on the Apache Repository Structure. The separate 
> groups maintaining different views provides the "tension" neccessary for 
> growth of an agreement. The key is eventual comprimise and 
> non-posesiveness in all the parties involved.

Well said. Count us in on that..

regards

Adam

Re: MD5 and Mirrors ( was Re: MD5 Hash )

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Well, after my own little survey, I've determined the following:

md5 on BSD (Apache Minotaur):

mdiggory@minotaur:/home/mdiggory> md5 foo.bar
MD5 (foo.bar) = 7f5e787ff3b930d906d01243ccf7c237

md5 has no built in option to compare the file to the checksum and 
return true/false.

Output of md5sum (GNU textutils) on Redhat:
mdiggory@osprey:/home/mdiggory> md5sum foo.bar
7f5e787ff3b930d906d01243ccf7c237 foo.bar

md5sum has a built in option which compares the md5 from the signature 
against the original file.

[mdiggory@osprey mdiggory]$ md5sum -c foo.bar.md5
foo.bar: OK


Output of Maven when publishing to repository is the md5 string minus 
the filename and is dependent on GNU md5sum.

*example snippet of the command as its run in jelly*
     <repository:exec>
       cd ${directory};
       md5sum ${artifactName} | sed 's/ .*$//' | tee ${artifactName}.md5;
       chgrp ${maven.repository.group} *;
       chmod g+w,a+r *;
     </repository:exec>

results in the string with no filename on ibiblio, and actually fails on 
minotaur as its BSD and the executable is not present.

What is the right/wrong way is not really a reasonable question to ask.

How to appropriately deal with the variants in both md5/md5sum ... 
generation and file structure specifically in relation to the repository 
are the important questions to throw around.


My opinions are the following:

Server side OS dependent tools are usually accessed in scripts (say, in 
a cron script which does checking and reports errors). These scripts 
will always be unique to an OS, It'll often be the case that they are 
custom for that particular need. the author usually writes their own 
string parsing routines (ie: md5sum foo.bar | sed 's/ .*$//').

A client side tool needs a simple and standard means of validating the 
content they are about to download or upload onto a server. If the 
repository structure already enforces the name of the md5 sum in 
relation to the file name, any internal naming done inside the md5 file 
is redundant. It would be good to just have the file contain the 
checksum which reduces parsing requirements on both the server and the 
client..

Client tools should be robust enough (or extensible enough) to generate 
the appropriate md5 sum for a particular artifact and to easily find and 
read/compare it to the content on the server.


-Mark

Markus M. May wrote:

> Hello Mark,
> 
> this is probably my fault. I checked this whole stuff with a very old 
> maven.md5-file. The format is now equal between the two projects.
> 
> Sorry for the confusion.
> 
> Markus
> 
> 

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu

Re: MD5 and Mirrors ( was Re: MD5 Hash )

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Well, after my own little survey, I've determined the following:

md5 on BSD (Apache Minotaur):

mdiggory@minotaur:/home/mdiggory> md5 foo.bar
MD5 (foo.bar) = 7f5e787ff3b930d906d01243ccf7c237

md5 has no built in option to compare the file to the checksum and 
return true/false.

Output of md5sum (GNU textutils) on Redhat:
mdiggory@osprey:/home/mdiggory> md5sum foo.bar
7f5e787ff3b930d906d01243ccf7c237 foo.bar

md5sum has a built in option which compares the md5 from the signature 
against the original file.

[mdiggory@osprey mdiggory]$ md5sum -c foo.bar.md5
foo.bar: OK


Output of Maven when publishing to repository is the md5 string minus 
the filename and is dependent on GNU md5sum.

*example snippet of the command as its run in jelly*
     <repository:exec>
       cd ${directory};
       md5sum ${artifactName} | sed 's/ .*$//' | tee ${artifactName}.md5;
       chgrp ${maven.repository.group} *;
       chmod g+w,a+r *;
     </repository:exec>

results in the string with no filename on ibiblio, and actually fails on 
minotaur as its BSD and the executable is not present.

What is the right/wrong way is not really a reasonable question to ask.

How to appropriately deal with the variants in both md5/md5sum ... 
generation and file structure specifically in relation to the repository 
are the important questions to throw around.


My opinions are the following:

Server side OS dependent tools are usually accessed in scripts (say, in 
a cron script which does checking and reports errors). These scripts 
will always be unique to an OS, It'll often be the case that they are 
custom for that particular need. the author usually writes their own 
string parsing routines (ie: md5sum foo.bar | sed 's/ .*$//').

A client side tool needs a simple and standard means of validating the 
content they are about to download or upload onto a server. If the 
repository structure already enforces the name of the md5 sum in 
relation to the file name, any internal naming done inside the md5 file 
is redundant. It would be good to just have the file contain the 
checksum which reduces parsing requirements on both the server and the 
client..

Client tools should be robust enough (or extensible enough) to generate 
the appropriate md5 sum for a particular artifact and to easily find and 
read/compare it to the content on the server.


-Mark

Markus M. May wrote:

> Hello Mark,
> 
> this is probably my fault. I checked this whole stuff with a very old 
> maven.md5-file. The format is now equal between the two projects.
> 
> Sorry for the confusion.
> 
> Markus
> 
> 

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org


Re: MD5 and Mirrors ( was Re: MD5 Hash )

Posted by "Markus M. May" <mm...@gmx.net>.
Hello Mark,

this is probably my fault. I checked this whole stuff with a very old 
maven.md5-file. The format is now equal between the two projects.

Sorry for the confusion.

Markus



Re: MD5 and Mirrors ( was Re: MD5 Hash )

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.

Adam R. B. Jack wrote:

>>Adam is perfectly right about this stuff. There is one more thing we need
> 
> to
> 
>>think about. Some repositories treat md5-files different. The structure on
>>apache.org is [filename - MD5 Hash]. But on ibiblio (maven-repository) it
> 
> is
> 
>>just [MD5 Hash]. So this needs to be somehow configurable.
> 
> 
> I think we need to shoot for what is considered the Apache Repository, none
> other. What Maven do w/ ibiblio will clearly impact us, but ought be
> secondary.
> 

A standard file format for md5 is more important in the long run, I 
think, than either the way Apache in general or more specifically the 
Maven project are dealing with generating the file contetns of md5 
checksums.

Currently, neither apache or maven md5's are validatable using the 
standard FSF GNU md5sum implementation.

> That said, Apache Repository is about to become Maven Repository (taken over
> my the Maven team) unless we help Apache get it's act together. Still, it
> might not be a bad thing, we expect them to be a primary publisher.
> 
> regards
> 
> Adam
> 

Working together, I believe both Depot, Repository, Maven, ... can come 
to a common agreement on the Apache Repository Structure. The separate 
groups maintaining different views provides the "tension" neccessary for 
growth of an agreement. The key is eventual comprimise and 
non-posesiveness in all the parties involved.

-Mark

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu

Re: MD5 and Mirrors ( was Re: MD5 Hash )

Posted by "Adam R. B. Jack" <aj...@trysybase.com>.
> Adam is perfectly right about this stuff. There is one more thing we need
to
> think about. Some repositories treat md5-files different. The structure on
> apache.org is [filename - MD5 Hash]. But on ibiblio (maven-repository) it
is
> just [MD5 Hash]. So this needs to be somehow configurable.

I think we need to shoot for what is considered the Apache Repository, none
other. What Maven do w/ ibiblio will clearly impact us, but ought be
secondary.

That said, Apache Repository is about to become Maven Repository (taken over
my the Maven team) unless we help Apache get it's act together. Still, it
might not be a bad thing, we expect them to be a primary publisher.

regards

Adam


MD% Standards (was Re: MD5 and Mirrors ( was Re: MD5 Hash ))

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Besides, my current experiments with gnu md5sum (2.0.21) show that the 
sum's on the Maven contents arn't verifyable to any other tool but the 
maven checksum plugin.

If they aren't verifiable to extenral tools thats a bad situation. I'm 
going to bring this up on the Maven list too.

http://www.faqs.org/rfcs/rfc1321.html

A hard fast "dig" through the RFC suggests a loophole here as there is 
no reference to what the contents of a md5 signature fle should look 
like. Seems more of a inherant "suggestion" in the implementation itself.

-Mark

Mark R. Diggory wrote:

> Its a tough call, is there any "standard" for the structure of the md5 
> contents out there? I think the Maven team would be keen to play along 
> with a standard and yet play along with any configurability as well.
> 
> -Mark Diggory
> 
> Markus M. May wrote:
> 
>> Adam is perfectly right about this stuff. There is one more thing we 
>> need to
>> think about. Some repositories treat md5-files different. The 
>> structure on
>> apache.org is [filename - MD5 Hash]. But on ibiblio (maven-repository) 
>> it is
>> just [MD5 Hash]. So this needs to be somehow configurable.
>> One more thing to think about :-)

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu

MD% Standards (was Re: MD5 and Mirrors ( was Re: MD5 Hash ))

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Besides, my current experiments with gnu md5sum (2.0.21) show that the 
sum's on the Maven contents arn't verifyable to any other tool but the 
maven checksum plugin.

If they aren't verifiable to extenral tools thats a bad situation. I'm 
going to bring this up on the Maven list too.

http://www.faqs.org/rfcs/rfc1321.html

A hard fast "dig" through the RFC suggests a loophole here as there is 
no reference to what the contents of a md5 signature fle should look 
like. Seems more of a inherant "suggestion" in the implementation itself.

-Mark

Mark R. Diggory wrote:

> Its a tough call, is there any "standard" for the structure of the md5 
> contents out there? I think the Maven team would be keen to play along 
> with a standard and yet play along with any configurability as well.
> 
> -Mark Diggory
> 
> Markus M. May wrote:
> 
>> Adam is perfectly right about this stuff. There is one more thing we 
>> need to
>> think about. Some repositories treat md5-files different. The 
>> structure on
>> apache.org is [filename - MD5 Hash]. But on ibiblio (maven-repository) 
>> it is
>> just [MD5 Hash]. So this needs to be somehow configurable.
>> One more thing to think about :-)

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org


Re: MD5 and Mirrors ( was Re: MD5 Hash )

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Its a tough call, is there any "standard" for the structure of the md5 
contents out there? I think the Maven team would be keen to play along 
with a standard and yet play along with any configurability as well.

-Mark Diggory

Markus M. May wrote:
> Adam is perfectly right about this stuff. There is one more thing we need to
> think about. Some repositories treat md5-files different. The structure on
> apache.org is [filename - MD5 Hash]. But on ibiblio (maven-repository) it is
> just [MD5 Hash]. So this needs to be somehow configurable. 
> 
> One more thing to think about :-)
> 
> 
>>Nick wrote:
>>
>>
>>>The MD5 should always come from the authoritative source (apache.org)
>>>using https.
>>
>>I'm not sure if all environments (JVMs) have HTTPS available. In a
>>somewhat
>>perfect world we'd try HTTPS and if it failed try HTTP, unless some
>>'minimum
>>security' was requested.
>>
>>I think we'll have to experiment and experince this area over
>>time/iterations.
>>
>>
>>>How are we going to know what the "authoritative" source for a resource
>>>is.
>>>For java we could enforce a reverse domain name.
>>
>>Four things:
>>
>>1) Repository URI/URL is what it is (whatever it is) and the URL for the
>>MD5
>>ought be the URL for the resources plus ".md5" on the end.
>>
>>2) As current Ruper thinking (coding) goes ... Mirrors ought mirror the
>>hierarchy, so wherever a resource is in the repo, the .md5 ought be next
>>to
>>it, and the original .md5 ought be in exactly the same relative position
>>(just relative to an apache root).
>>
>>3) Mirroring is kinda hacked into Ruper right now, it silently moves the
>>root of a repository (originally set relative to the mirror locator CGI
>>script) to one such mirror. As such Ruper doesn't really know about
>>mirrors.
>>
>>4) We probably need to rethink current thinking... ;-)
>>
>>regards,
>>
>>Adam
>>
> 
> 

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu

Re: MD5 and Mirrors ( was Re: MD5 Hash )

Posted by "Markus M. May" <mm...@gmx.net>.
Adam is perfectly right about this stuff. There is one more thing we need to
think about. Some repositories treat md5-files different. The structure on
apache.org is [filename - MD5 Hash]. But on ibiblio (maven-repository) it is
just [MD5 Hash]. So this needs to be somehow configurable. 

One more thing to think about :-)

> Nick wrote:
> 
> > The MD5 should always come from the authoritative source (apache.org)
> > using https.
> 
> I'm not sure if all environments (JVMs) have HTTPS available. In a
> somewhat
> perfect world we'd try HTTPS and if it failed try HTTP, unless some
> 'minimum
> security' was requested.
> 
> I think we'll have to experiment and experince this area over
> time/iterations.
> 
> > How are we going to know what the "authoritative" source for a resource
> > is.
> > For java we could enforce a reverse domain name.
> 
> Four things:
> 
> 1) Repository URI/URL is what it is (whatever it is) and the URL for the
> MD5
> ought be the URL for the resources plus ".md5" on the end.
> 
> 2) As current Ruper thinking (coding) goes ... Mirrors ought mirror the
> hierarchy, so wherever a resource is in the repo, the .md5 ought be next
> to
> it, and the original .md5 ought be in exactly the same relative position
> (just relative to an apache root).
> 
> 3) Mirroring is kinda hacked into Ruper right now, it silently moves the
> root of a repository (originally set relative to the mirror locator CGI
> script) to one such mirror. As such Ruper doesn't really know about
> mirrors.
> 
> 4) We probably need to rethink current thinking... ;-)
> 
> regards,
> 
> Adam
> 


MD5 and Mirrors ( was Re: MD5 Hash )

Posted by "Adam R. B. Jack" <aj...@trysybase.com>.
Nick wrote:

> The MD5 should always come from the authoritative source (apache.org)
> using https.

I'm not sure if all environments (JVMs) have HTTPS available. In a somewhat
perfect world we'd try HTTPS and if it failed try HTTP, unless some 'minimum
security' was requested.

I think we'll have to experiment and experince this area over
time/iterations.

> How are we going to know what the "authoritative" source for a resource
> is.
> For java we could enforce a reverse domain name.

Four things:

1) Repository URI/URL is what it is (whatever it is) and the URL for the MD5
ought be the URL for the resources plus ".md5" on the end.

2) As current Ruper thinking (coding) goes ... Mirrors ought mirror the
hierarchy, so wherever a resource is in the repo, the .md5 ought be next to
it, and the original .md5 ought be in exactly the same relative position
(just relative to an apache root).

3) Mirroring is kinda hacked into Ruper right now, it silently moves the
root of a repository (originally set relative to the mirror locator CGI
script) to one such mirror. As such Ruper doesn't really know about mirrors.

4) We probably need to rethink current thinking... ;-)

regards,

Adam


Re: MD5 Hash

Posted by Nick Chalko <ni...@chalko.com>.
Adam R. B. Jack wrote:

>Hmm, what makes folk think that the file could be changed without the MD5
>hash file being changed also. I feel there has to be some private key from
>the originator, to ensure that nobody could fake both.
>
>  
>
The MD5 should always come from the authoritative source (apache.org)
using https.

How are we going to know what the "authoritative" source for a resource
is.
For java we could enforce a reverse domain name.

ie  packages  like org.apache....   must get a md5 for an apache.org
website.

>So, if there are such keys, how do we acquire them? How do we trust them?
>
>regards
>
>Adam
>  
>



Re: MD5 Hash

Posted by "Adam R. B. Jack" <aj...@trysybase.com>.
> yes, I can enlighten you all a littel bit about MD5 hashs. The basic is
that

Ok, thanks for that. I get the gist, I get the premis. Now, more
practically...

What are the inputs to the algorythm? Meaning, we have the file, we have the
MD5 resultant hash (assuming the file on the server has not been modified),
and we have the algorythm, but do we need anything else (e.g. keys) in order
to re-compute/check the resultant hash?

Hmm, what makes folk think that the file could be changed without the MD5
hash file being changed also. I feel there has to be some private key from
the originator, to ensure that nobody could fake both.

So, if there are such keys, how do we acquire them? How do we trust them?

regards

Adam


Re: MD5 Hash

Posted by "Markus M. May" <mm...@gmx.net>.
Hello,

yes, I can enlighten you all a littel bit about MD5 hashs. The basic is that
a hash is a unique key for a value (in this case a file). From this key you
cannot guess or even generate the original value (file). It is generated with
the MD5 algorithm using javas security stuff. So basically if a file is
updated on the apache.org servers the MD5 hash is generated. When the file is
updated the hash of the updated file is normally (and this would be a very very
small chance) not the same. So, basically for each file a new hash is
generated. You can then create another hash from the same file. If you are using the
same algorithm (MD5/SHA) you get then the same hash (means: from the same
file you always get the same hash-code when using the same algorithm). On the
apache.org servers there is always an .MD5 file for each deployed file. In
this file the original filename and the hashcode is written. This basically
means, you can generate with the same algorithm the hashcode and then you can
check if the hash in the .md5-file is the same as the generated one. If it is
not, it is a good guess, that the file you downloaded is not the one published
by apache.org.

Hope this helps you understand the issue. If there are more questions
concerning this, just go ahead and ask.

R,

Markus


> > I just browsed a little around and found some special solutions for the
> > checksum stuff with MD5-Hashes. ANT has already a nice task for this
> 
> How would we integrate with it? Is the task part of 'core'? I'm not
> against
> leveraging others, especially ant, 'cos I suspect that'll be a large part
> of
> our user base. So, as a start, I'd be for it.
> 
> That said, longer term I'd love to see it for command line also.
> 
> BTW: Are you at a point where you can explain the mechanics of this? What
> keys does one use to check an MD5? Where do the keys come from, can we
> trust
> them, etc.? Can you educate us all?
> 
> regards,
> 
> Adam
> 


Re: MD5 Hash

Posted by "Adam R. B. Jack" <aj...@trysybase.com>.
> I just browsed a little around and found some special solutions for the
> checksum stuff with MD5-Hashes. ANT has already a nice task for this

How would we integrate with it? Is the task part of 'core'? I'm not against
leveraging others, especially ant, 'cos I suspect that'll be a large part of
our user base. So, as a start, I'd be for it.

That said, longer term I'd love to see it for command line also.

BTW: Are you at a point where you can explain the mechanics of this? What
keys does one use to check an MD5? Where do the keys come from, can we trust
them, etc.? Can you educate us all?

regards,

Adam