You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2006/12/08 10:12:57 UTC

"correct" format for the md5 files?

I thought adding the <checksum> hooks to out build.xml to generate the MD5
sums as part of the package command (so we don't have to run it manually)
would be relaly easy ... but discovered that ant outputs only the checksum
to the files -- no input filename, no newline.

aparently, a "format" option has been added to the <checksum> tag in the
trunk to let you choose which format you want...

	Specifies the pattern to use as one of a well-known format.
	Supported values are "CHECKSUM" (only the checksum itself, the default),
	"MD5SUM" the format of GNU textutils md5sum and "SVF" the format of *BSDs
	md5 command.
	(see: http://ant.apache.org/manual-rc/CoreTasks/checksum.html )

...but it got me wondering, what format do we want?

Lucene-Java uses GNU textutils format...
http://www.apache.org/dist/lucene/java/lucene-2.0.0-src.tar.gz.md5

Nutch was using just the checksum...
http://www.apache.org/dist/lucene/nutch/nutch-0.7.2.tar.gz.md5
...but then switched to the BSD format...
http://www.apache.org/dist/lucene/nutch/nutch-0.8.tar.gz.md5
...and then switched back...
http://www.apache.org/dist/lucene/nutch/nutch-0.8.1.tar.gz.md5


Which one should we use?

Does it matter as long as we are consistent? .. because if the answer is
"no" then i vote for applying this to our build.xml ...

Index: build.xml
===================================================================
--- build.xml   (revision 483882)
+++ build.xml   (working copy)
@@ -70,6 +70,9 @@
   <!-- Destination for distribution files (demo WAR, src distro, etc.) -->
   <property name="dist" value="dist" />

+  <!-- Type of checksum to compute for distribution files -->
+  <property name="checksum.algorithm" value="md5" />
+
   <!-- Example directory -->
   <property name="example" value="example" />

@@ -416,6 +419,17 @@
         prefix="${fullnamever}/docs/api/" />
     </tar>

+    <checksum fileext=".${checksum.algorithm}"
+              algorithm="${checksum.algorithm}"
+              todir="${dist}"
+              forceoverwrite="yes"
+    >
+       <fileset dir="${dist}">
+         <include name="${fullnamever}*"/>
+         <exclude name="*.${checksum.algorithm}"/>
+       </fileset>
+    </checksum>
+
   </target>

   <target name="nightly"


Re: Re: "correct" format for the md5 files?

Posted by Yoav Shapira <yo...@apache.org>.
BTW, for those concerned, there's nothing at the ASF that says you
must use only MD5.  You can add SHA-1 or any other algorithm if you
want.  See Ant for example: they've been doing MD5 and SHA-1 side by
side for years now (http://ant.apache.org/bindownload.cgi)

Yoav

On 12/8/06, Yonik Seeley <yo...@apache.org> wrote:
> On 12/8/06, Chris Hostetter <ho...@fucit.org> wrote:
> > : It _is_ a valid concern in general (I would never use md5 as a
> > : cryptographic hash, e.g., for passwords), but significantly less of a
> > : concern for this use.  The most important role of the hash is to
> > : ensure no corruption occurred during transfer.
> >
> > Bingo:  We checksum the files with MD5, we sign the files with GPG
>
> And the standard digital signature content hash is defined to be SHA-1
> AFAIK.  And yes, someone has managed to find a way to get collisions
> in SHA1 hashes in less time than it would take to purely guess at
> random.  But let's be serious... for our projects it's going to be far
> easier and cheaper to circumvent the encryption than break it.
>
> When PGP/GPG switch to a different mechanism by default, so will we.
>
> -Yonik
>

Re: Re: "correct" format for the md5 files?

Posted by Yonik Seeley <yo...@apache.org>.
On 12/8/06, Chris Hostetter <ho...@fucit.org> wrote:
> : It _is_ a valid concern in general (I would never use md5 as a
> : cryptographic hash, e.g., for passwords), but significantly less of a
> : concern for this use.  The most important role of the hash is to
> : ensure no corruption occurred during transfer.
>
> Bingo:  We checksum the files with MD5, we sign the files with GPG

And the standard digital signature content hash is defined to be SHA-1
AFAIK.  And yes, someone has managed to find a way to get collisions
in SHA1 hashes in less time than it would take to purely guess at
random.  But let's be serious... for our projects it's going to be far
easier and cheaper to circumvent the encryption than break it.

When PGP/GPG switch to a different mechanism by default, so will we.

-Yonik

Re: Re: "correct" format for the md5 files?

Posted by Chris Hostetter <ho...@fucit.org>.
: It _is_ a valid concern in general (I would never use md5 as a
: cryptographic hash, e.g., for passwords), but significantly less of a
: concern for this use.  The most important role of the hash is to
: ensure no corruption occurred during transfer.

Bingo:  We checksum the files with MD5, we sign the files with GPG



-Hoss


Re: Re: "correct" format for the md5 files?

Posted by Mike Klaas <mi...@gmail.com>.
On 12/8/06, Simon Willnauer <si...@googlemail.com> wrote:
> Oh by the way I do have 2 people in this room being able to find
> collisions to md5 within the next 15 minutes. But it is true that this
> is quiet hypothetical .
>
> anyway...

Can they also produce a malicious distribution of solr which hashes
identically? <g>.

It _is_ a valid concern in general (I would never use md5 as a
cryptographic hash, e.g., for passwords), but significantly less of a
concern for this use.  The most important role of the hash is to
ensure no corruption occurred during transfer.

cheers,
-Mike

Re: "correct" format for the md5 files?

Posted by Simon Willnauer <si...@googlemail.com>.
Oh by the way I do have 2 people in this room being able to find
collisions to md5 within the next 15 minutes. But it is true that this
is quiet hypothetical .

anyway...

yours simon

On 12/8/06, Simon Willnauer <si...@googlemail.com> wrote:
> True, so do it proper if you can.
>
>
> best regards simon
>
> On 12/8/06, WHIRLYCOTT <ph...@whirlycott.com> wrote:
> > This isn't as urgent as you make it out to be.  There are just a few
> > people in the world, mostly Chinese researchers, who have the
> > capability to do this.  I agree that SHA is better, but this clearly
> > isn't the type of thing that should hold up a Solr release!
> >
> > phil.
> >
> > On Dec 8, 2006, at 4:37 PM, Simon Willnauer wrote:
> >
> > > Hello,
> > > I'm wondering why people still use MD5 for digital signatures and / or
> > > checksums.
> > > Recent results on the analysis of MD5 reduce the effort to find
> > > collisions to a few minutes on an old notebook. Thus, collision and
> > > multi-collision attacks on MD5 are feasible and practical.
> > > I would recommend to migrate directly from MD5 to SHA-2 and add SHA-2
> > > hashes to existing MD5 lists if possible. Wherever MD5 is still used
> > > to detect the manipulation of
> > > data or software, it must be replaced as soon as possible!
> > >
> > > just my 2 cent.
> > >
> > > best regards simon
> > >
> > > On 12/8/06, Bertrand Delacretaz <bd...@apache.org> wrote:
> > >> On 12/8/06, Chris Hostetter <ho...@fucit.org> wrote:
> > >>
> > >> > ...but it got me wondering, what format do we want?...
> > >>
> > >> The format that Yonik used works (on my macosx system, but also under
> > >> Linux I suspect) with
> > >>
> > >>   md5sum -c apache-solr-1.1.0-incubating.tgz.md5
> > >>
> > >> which is convenient I think.
> > >>
> > >> -Bertrand
> > >>
> >
> >
> > --
> >                                     Whirlycott
> >                                     Philip Jacob
> >                                     phil@whirlycott.com
> >                                     http://www.whirlycott.com/phil/
> >
> >
> >
>

Re: "correct" format for the md5 files?

Posted by Simon Willnauer <si...@googlemail.com>.
True, so do it proper if you can.


best regards simon

On 12/8/06, WHIRLYCOTT <ph...@whirlycott.com> wrote:
> This isn't as urgent as you make it out to be.  There are just a few
> people in the world, mostly Chinese researchers, who have the
> capability to do this.  I agree that SHA is better, but this clearly
> isn't the type of thing that should hold up a Solr release!
>
> phil.
>
> On Dec 8, 2006, at 4:37 PM, Simon Willnauer wrote:
>
> > Hello,
> > I'm wondering why people still use MD5 for digital signatures and / or
> > checksums.
> > Recent results on the analysis of MD5 reduce the effort to find
> > collisions to a few minutes on an old notebook. Thus, collision and
> > multi-collision attacks on MD5 are feasible and practical.
> > I would recommend to migrate directly from MD5 to SHA-2 and add SHA-2
> > hashes to existing MD5 lists if possible. Wherever MD5 is still used
> > to detect the manipulation of
> > data or software, it must be replaced as soon as possible!
> >
> > just my 2 cent.
> >
> > best regards simon
> >
> > On 12/8/06, Bertrand Delacretaz <bd...@apache.org> wrote:
> >> On 12/8/06, Chris Hostetter <ho...@fucit.org> wrote:
> >>
> >> > ...but it got me wondering, what format do we want?...
> >>
> >> The format that Yonik used works (on my macosx system, but also under
> >> Linux I suspect) with
> >>
> >>   md5sum -c apache-solr-1.1.0-incubating.tgz.md5
> >>
> >> which is convenient I think.
> >>
> >> -Bertrand
> >>
>
>
> --
>                                     Whirlycott
>                                     Philip Jacob
>                                     phil@whirlycott.com
>                                     http://www.whirlycott.com/phil/
>
>
>

Re: "correct" format for the md5 files?

Posted by WHIRLYCOTT <ph...@whirlycott.com>.
This isn't as urgent as you make it out to be.  There are just a few  
people in the world, mostly Chinese researchers, who have the  
capability to do this.  I agree that SHA is better, but this clearly  
isn't the type of thing that should hold up a Solr release!

phil.

On Dec 8, 2006, at 4:37 PM, Simon Willnauer wrote:

> Hello,
> I'm wondering why people still use MD5 for digital signatures and / or
> checksums.
> Recent results on the analysis of MD5 reduce the effort to find
> collisions to a few minutes on an old notebook. Thus, collision and
> multi-collision attacks on MD5 are feasible and practical.
> I would recommend to migrate directly from MD5 to SHA-2 and add SHA-2
> hashes to existing MD5 lists if possible. Wherever MD5 is still used
> to detect the manipulation of
> data or software, it must be replaced as soon as possible!
>
> just my 2 cent.
>
> best regards simon
>
> On 12/8/06, Bertrand Delacretaz <bd...@apache.org> wrote:
>> On 12/8/06, Chris Hostetter <ho...@fucit.org> wrote:
>>
>> > ...but it got me wondering, what format do we want?...
>>
>> The format that Yonik used works (on my macosx system, but also under
>> Linux I suspect) with
>>
>>   md5sum -c apache-solr-1.1.0-incubating.tgz.md5
>>
>> which is convenient I think.
>>
>> -Bertrand
>>


--
                                    Whirlycott
                                    Philip Jacob
                                    phil@whirlycott.com
                                    http://www.whirlycott.com/phil/



Re: "correct" format for the md5 files?

Posted by Simon Willnauer <si...@googlemail.com>.
Hello,
I'm wondering why people still use MD5 for digital signatures and / or
checksums.
Recent results on the analysis of MD5 reduce the effort to find
collisions to a few minutes on an old notebook. Thus, collision and
multi-collision attacks on MD5 are feasible and practical.
I would recommend to migrate directly from MD5 to SHA-2 and add SHA-2
hashes to existing MD5 lists if possible. Wherever MD5 is still used
to detect the manipulation of
data or software, it must be replaced as soon as possible!

just my 2 cent.

best regards simon

On 12/8/06, Bertrand Delacretaz <bd...@apache.org> wrote:
> On 12/8/06, Chris Hostetter <ho...@fucit.org> wrote:
>
> > ...but it got me wondering, what format do we want?...
>
> The format that Yonik used works (on my macosx system, but also under
> Linux I suspect) with
>
>   md5sum -c apache-solr-1.1.0-incubating.tgz.md5
>
> which is convenient I think.
>
> -Bertrand
>

Re: "correct" format for the md5 files?

Posted by Chris Hostetter <ho...@fucit.org>.
: The format that Yonik used works (on my macosx system, but also under
: Linux I suspect) with
:
:   md5sum -c apache-solr-1.1.0-incubating.tgz.md5

hey look at that ... a "-c" option on md5sum.

The FreeBSD md5 command doesn't seem to have a corrisponding check
command, so making sure "md5sum -c" works seems like a worthwhile goal.

Fortunately you can do a lot of amazing things with any macros ...
unfortunately ant doesn't seem to have any notion of "variables" so it's
not pretty to look at.

"ant package" now builds the md5 files automatically in the same format as
the "md5sum" command ... if anyone sees anything wrong with it we can
allways yank it out.


-Hoss


Re: "correct" format for the md5 files?

Posted by Bertrand Delacretaz <bd...@apache.org>.
On 12/8/06, Chris Hostetter <ho...@fucit.org> wrote:

> ...but it got me wondering, what format do we want?...

The format that Yonik used works (on my macosx system, but also under
Linux I suspect) with

  md5sum -c apache-solr-1.1.0-incubating.tgz.md5

which is convenient I think.

-Bertrand