You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Matus UHLAR - fantomas <uh...@fantomas.sk> on 2007/07/13 14:43:59 UTC

problems with TVD_SPACE_RATIO

Hello,

seems I have problems with TVD_SPACE_RATIO. It seems it should match
vertical words
(http://marc.info/?l=spamassassin-users&m=118431331726635&w=2) 
or messages with many space characters
(http://marc.info/?l=spamassassin-users&m=118427588731549&w=2)

However, when I was checking SA, it matched e-mail with only one word (test)
and it also matches empty e-mails with only one .zip attachment.
(one of slovak banks send account balances in such encrypted e-mails).

Could anyone confirm this? Should I report a bug report with this?

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I feel like I'm diagonally parked in a parallel universe. 

Re: problems with TVD_SPACE_RATIO

Posted by Michael Monnerie <mi...@is.it-management.at>.
On Montag 25 Mai 2009 Justin Mason wrote:
> please attach FPs you can share to tickets on bugzilla.  they do
> help.

I've decreased TVD_SPACE_RATIO to 1.2 points because I get FPs with it, 
at least since 2007-07-25. I'll try to find FPs and report to bugzilla.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4


Re: problems with TVD_SPACE_RATIO

Posted by mouss <mo...@ml.netoyen.net>.
Karsten Bräckelmann a écrit :
> On Wed, 2009-05-27 at 09:21 +0200, Michael Monnerie wrote:
>> On Mittwoch 27 Mai 2009 mouss wrote:
>>> and 4454 is a one line message, but the signature causes the hit.
> 
> The fact that mailing-list footer is forced onto the message with no
> newline causes it. And the second hardly counts as human generated. ;)
> 

true, the guy replied without adding any personal comments. but these
are things that happen:

- I have a problem with foobar
- what does /sbin/joe show
- $joe_output

now, the question is: what should really be caught?

If I can suggest anything, I would like to propose the following:
when a rule is designed:
- document what it should catch
- give examples of things it catches and things it shouldn't catch (so
that if someone modifies the rule, he has some hints on what he can do)


of course, if there's work to do, count me in (well, subject to my
availability...).

>> And my messages are just one-liners without .sig that should never hit 
>> this rule at all.
> 
> Checked those samples from both of you. Lots more analysis of this eval
> function added to the bug report.
> 
> See comment 12. Smells kinda fishy to me, and probably broke at some
> point since its original introduction. :/
> 
> 
>> I don't have other examples in original format, but just a few days ago 
>> got a FP report where this rule hit a normal, german, human-typed mail.
>> I'll restore the original score now to see if I get more reports.
> 
> Hmm, I'd love to see that one. Any *human-typed* mail featuring a real
> sentence should not trigger this. Unless it's followed directly by a
> huge machine-generated paste or something, without an empty line...
> 


I'm not sure. but I'll have to dig in my mail before I can see anything
real.


Re: problems with TVD_SPACE_RATIO

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Wed, 2009-05-27 at 09:21 +0200, Michael Monnerie wrote:
> On Mittwoch 27 Mai 2009 mouss wrote:
> > and 4454 is a one line message, but the signature causes the hit.

The fact that mailing-list footer is forced onto the message with no
newline causes it. And the second hardly counts as human generated. ;)

> And my messages are just one-liners without .sig that should never hit 
> this rule at all.

Checked those samples from both of you. Lots more analysis of this eval
function added to the bug report.

See comment 12. Smells kinda fishy to me, and probably broke at some
point since its original introduction. :/


> I don't have other examples in original format, but just a few days ago 
> got a FP report where this rule hit a normal, german, human-typed mail.
> I'll restore the original score now to see if I get more reports.

Hmm, I'd love to see that one. Any *human-typed* mail featuring a real
sentence should not trigger this. Unless it's followed directly by a
huge machine-generated paste or something, without an empty line...


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: problems with TVD_SPACE_RATIO

Posted by Michael Monnerie <mi...@is.it-management.at>.
On Mittwoch 27 Mai 2009 mouss wrote:
> and 4454 is a one line message, but the signature causes the hit.

And my messages are just one-liners without .sig that should never hit 
this rule at all.

I don't have other examples in original format, but just a few days ago 
got a FP report where this rule hit a normal, german, human-typed mail.
I'll restore the original score now to see if I get more reports.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc.
----------------------------------
Sorcerers have their magic wands:
  powerful, potentially dangerous tools with a life of their own.
Witches have their familiars:
  creatures disguised as household beasts that could,
  if they choose, wreak the witches' havoc.
Mystics have their golems:
  beings built of wood and tin brought to life to do their
  masters' bidding.
I have Linux.
----------------------------------


Re: problems with TVD_SPACE_RATIO

Posted by mouss <mo...@ml.netoyen.net>.
Karsten Bräckelmann a écrit :
> On Tue, 2009-05-26 at 22:12 +0200, mouss wrote:
>> Karsten Bräckelmann a écrit :
> 
>>> Bug 6119 has been opened already. Please attach additional samples
>>> there, rather than opening a new bug for every sample.  Thanks!
>>>
>>>   https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6119
>> I've attached a few. If more is needed, just ask...
> 
> No more auto-generated content, please. :)  Real, human written samples
> on the other hand...
> 

4454 and 4455 are "human written". (if people who spend a lot of time in
front of a terminal are still considered human ;)

and 4454 is a one line message, but the signature causes the hit.


> See comment 7.
> 
>   guenther
> 


Re: problems with TVD_SPACE_RATIO

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Tue, 2009-05-26 at 22:12 +0200, mouss wrote:
> Karsten Bräckelmann a écrit :

> > Bug 6119 has been opened already. Please attach additional samples
> > there, rather than opening a new bug for every sample.  Thanks!
> > 
> >   https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6119
> 
> I've attached a few. If more is needed, just ask...

No more auto-generated content, please. :)  Real, human written samples
on the other hand...

See comment 7.

  guenther

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: problems with TVD_SPACE_RATIO

Posted by mouss <mo...@ml.netoyen.net>.
Karsten Bräckelmann a écrit :
> On Mon, 2009-05-25 at 22:09 +0100, Justin Mason wrote:
>> please attach FPs you can share to tickets on bugzilla.  they do help.
> 
> Bug 6119 has been opened already. Please attach additional samples
> there, rather than opening a new bug for every sample.  Thanks!
> 
>   https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6119
> 
> 


I've attached a few. If more is needed, just ask...


Re: problems with TVD_SPACE_RATIO

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Mon, 2009-05-25 at 22:09 +0100, Justin Mason wrote:
> please attach FPs you can share to tickets on bugzilla.  they do help.

Bug 6119 has been opened already. Please attach additional samples
there, rather than opening a new bug for every sample.  Thanks!

  https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6119


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: problems with TVD_SPACE_RATIO

Posted by Justin Mason <jm...@jmason.org>.
please attach FPs you can share to tickets on bugzilla.  they do help.

--j.

On Mon, May 25, 2009 at 21:37, mouss <mo...@ml.netoyen.net> wrote:
> Matus UHLAR - fantomas a écrit :
>> Hello,
>>
>> re-sending old issue since I often see this problem.
>>
>> On 13.07.07 14:43, Matus UHLAR - fantomas wrote:
>>> seems I have problems with TVD_SPACE_RATIO. It seems it should match
>>> vertical words
>>> (http://marc.info/?l=spamassassin-users&m=118431331726635&w=2)
>>> or messages with many space characters
>>> (http://marc.info/?l=spamassassin-users&m=118427588731549&w=2)
>>>
>>> However, when I was checking SA, it matched e-mail with only one word (test)
>>> and it also matches empty e-mails with only one .zip attachment.
>>> (one of slovak banks send account balances in such encrypted e-mails).
>>
>> not even .zip, but also many others and not only from this bank.
>>
>>> Could anyone confirm this? Should I report a bug report with this?
>>
>> Does anyone see many FP's due to this rule hittings?
>>
>
> yes, the rule FPs with many messages in the
>        freebsd-ports-bugs@freebsd.org
> list. not enough to drive them above the 5.0 shore.
>
> can you give more infos about the FP messages? is there a way to write a
> rule to cancel TVD_SPACE_RATIO for these?
>
>

Re: problems with TVD_SPACE_RATIO

Posted by mouss <mo...@ml.netoyen.net>.
Matus UHLAR - fantomas a écrit :
> Hello,
> 
> re-sending old issue since I often see this problem.
> 
> On 13.07.07 14:43, Matus UHLAR - fantomas wrote:
>> seems I have problems with TVD_SPACE_RATIO. It seems it should match
>> vertical words
>> (http://marc.info/?l=spamassassin-users&m=118431331726635&w=2) 
>> or messages with many space characters
>> (http://marc.info/?l=spamassassin-users&m=118427588731549&w=2)
>>
>> However, when I was checking SA, it matched e-mail with only one word (test)
>> and it also matches empty e-mails with only one .zip attachment.
>> (one of slovak banks send account balances in such encrypted e-mails).
> 
> not even .zip, but also many others and not only from this bank. 
> 
>> Could anyone confirm this? Should I report a bug report with this?
> 
> Does anyone see many FP's due to this rule hittings?
> 

yes, the rule FPs with many messages in the
	freebsd-ports-bugs@freebsd.org
list. not enough to drive them above the 5.0 shore.

can you give more infos about the FP messages? is there a way to write a
rule to cancel TVD_SPACE_RATIO for these?

Re: problems with TVD_SPACE_RATIO

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
Hello,

re-sending old issue since I often see this problem.

On 13.07.07 14:43, Matus UHLAR - fantomas wrote:
> seems I have problems with TVD_SPACE_RATIO. It seems it should match
> vertical words
> (http://marc.info/?l=spamassassin-users&m=118431331726635&w=2) 
> or messages with many space characters
> (http://marc.info/?l=spamassassin-users&m=118427588731549&w=2)
> 
> However, when I was checking SA, it matched e-mail with only one word (test)
> and it also matches empty e-mails with only one .zip attachment.
> (one of slovak banks send account balances in such encrypted e-mails).

not even .zip, but also many others and not only from this bank. 

> Could anyone confirm this? Should I report a bug report with this?

Does anyone see many FP's due to this rule hittings?

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Windows found: (R)emove, (E)rase, (D)elete