You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Karsten Bräckelmann <gu...@rudersport.de> on 2008/12/02 20:52:55 UTC

The bug in 1.5.x (was: Re: Bug in iXhash plugin - fixed version available)

On Tue, 2008-12-02 at 11:32 +0100, Dirk Bonengel wrote:
> I'm looking into it. Only thing - seems to work here.

The reason why hash one is broken is the "workaround" introduced into
version 1.5.
  $body_copy =~ s/[[:graph:]]+//go;

This can not work.

Hash one is supposed to do this: Condense all consecutive, identical
whitespace chars into a single occurrence, then drop the graph chars and
compute the md5sum for the remaining whitespace signature.

The "workaround" above removes words *first*, usually resulting in a
bunch of spaces per line. With the original algorithm, these are exactly
what's being hashed! With the "workaround", they falsely get condensed
into a single space per line once the original hash generation is being
run.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: The bug in 1.5.x

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Tue, 2008-12-02 at 23:48 +0100, Dirk Bonengel wrote:
> Karsten Bräckelmann schrieb:

> > The "workaround" above removes words *first*, usually resulting in a
> > bunch of spaces per line. With the original algorithm, these are exactly
> > what's being hashed! With the "workaround", they falsely get condensed
> > into a single space per line once the original hash generation is being
> > run.
> 
> Not quite. I throw away the content of $body_copy a few lines later and 
> start afresh.

Ugh. You're right, sorry. How could I overlook that? :-/

However, that "workaround" still significantly changes the condition for
hash one, which currently pretty much counts spaces between "words" by
enforcing at least one char after the \s. Hash one will not be computed
for short bodies.
  if (($body_copy =~ /(\s.+?){20}/g ) ||

Using .* rather than .+ will fix this issue with short mail.

Hmm, this just doesn't necessarily explain all the reported non-hits,
cause it doesn't break the hash for long-ish mail.


> The workaround is needed to prevent Perl (at least the one I use) to 
> stall when applying /(\s.+?){20}/ to a complete mail body.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: The bug in 1.5.x

Posted by Dirk Bonengel <di...@gmx.de>.
Karsten Bräckelmann schrieb:
> On Tue, 2008-12-02 at 11:32 +0100, Dirk Bonengel wrote:
>   
>> I'm looking into it. Only thing - seems to work here.
>>     
>
> The reason why hash one is broken is the "workaround" introduced into
> version 1.5.
>   $body_copy =~ s/[[:graph:]]+//go;
>
> This can not work.
>
> Hash one is supposed to do this: Condense all consecutive, identical
> whitespace chars into a single occurrence, then drop the graph chars and
> compute the md5sum for the remaining whitespace signature.
>
> The "workaround" above removes words *first*, usually resulting in a
> bunch of spaces per line. With the original algorithm, these are exactly
> what's being hashed! With the "workaround", they falsely get condensed
> into a single space per line once the original hash generation is being
> run.
>
>   
Not quite. I throw away the content of $body_copy a few lines later and 
start afresh.
The workaround is needed to prevent Perl (at least the one I use) to 
stall when applying /(\s.+?){20}/ to a complete mail body.

Dirk