You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Kai Schaetzl <ma...@conactive.com> on 2006/03/31 23:09:39 UTC

1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice

I just saw that a normal Ebay outbid notice hit two high-score rules. One 
is from sare-spoof and I already contacted the maintainer. But one is in 
the default 3.1.1 ruleset and I think this rule should get completely 
removed or get a score of 0. It's

1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice

>From grepping the rules it does what it says: it checks if there are two 
B/Q encoding identifiers in the subject. Why is this scoring with 1.72 or 
at all? This is absolutely valid Q/B encoding and actually *required* by 
RFC if your subject line is longer than 80 (or was it 72?) characters 
(minus the encoding, so it's actually more like a 60 raw character limit).
This rule will hit on *lots* of non-ASCII mail and on almost all mail 
coming from Ebay Germany.

There are also the rules SUBJECT_EXCESS_QP and SUBJECT_EXCESS_BASE64 which 
are "similar". QP scores 0 and BASE64 scores 0.449. This is much more 
reasonable.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com




Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice

Posted by mouss <us...@free.fr>.
Kai Schaetzl wrote:
> I just saw that a normal Ebay outbid notice hit two high-score rules. One 
> is from sare-spoof and I already contacted the maintainer. But one is in 
> the default 3.1.1 ruleset and I think this rule should get completely 
> removed or get a score of 0. It's
> 
> 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice
> 
> From grepping the rules it does what it says: it checks if there are two 
> B/Q encoding identifiers in the subject. Why is this scoring with 1.72 or 
> at all? This is absolutely valid Q/B encoding and actually *required* by 
> RFC if your subject line is longer than 80 (or was it 72?) characters 
> (minus the encoding, so it's actually more like a 60 raw character limit).
> This rule will hit on *lots* of non-ASCII mail and on almost all mail 
> coming from Ebay Germany.
> 
> There are also the rules SUBJECT_EXCESS_QP and SUBJECT_EXCESS_BASE64 which 
> are "similar". QP scores 0 and BASE64 scores 0.449. This is much more 
> reasonable.
> 

same here (multiple FPs). I disabled these rules.
many popular MSPs here in .fr use software that trigger these. The days 
I feel angry and bad, I can block caramail, but I can never block 
laposte.net and wanadoo...

For similar reasons, I had to disable (after lowering the score 
incrementally) some *bl lists. for now, these are sorbs, rfci, 
bad_whois, spamcops. the list seems growing:)

Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice

Posted by Michael Monnerie <mi...@it-management.at>.
On Donnerstag, 13. April 2006 13:35 Mark Martinec wrote:
> Agreed, this rule is completely inappropriate, it penalizes valid
> encoding according to RFC 2047 and fires on any lengthier Subject
> line in non-English language. It should disappear or have a
> much reduced default score.

The problem seems to be that
1) most spam is english
2) most people contributing mass-checks are english speaking
3) therefore most ham+spam tested in mass-checks are english

in order to improve the situation, more mass-check testers with 
non-english language ham+spam should contribute, see 
http://wiki.apache.org/spamassassin/MassCheck?highlight=%28mass%29

I'm not a SA dev, but I think they once wrote more supporters would be 
nice. I do mass-checks, and if somebody wants to help, I have a working 
script you can have in order to contribute to testing. It's a simple 
setup, and then your server has some work to do overnight. On mine, 
it's about 1 hour per night, so pas problem.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660/4156531                          .network.your.ideas.
// PGP Key:   "lynx -source http://zmi.at/zmi3.asc | gpg --import"
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net                 Key-ID: 0x55CBA4EE

Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Apr 13, 2006 at 01:35:19PM +0200, Mark Martinec wrote:
> Agreed, this rule is completely inappropriate, it penalizes valid
> encoding according to RFC 2047 and fires on any lengthier Subject
> line in non-English language. It should disappear or have a
> much reduced default score.

Says you. ;)

  1.047   1.4619   0.0792    0.949   0.58    0.89  SUBJECT_ENCODED_TWICE

So in the results used to generate scores, that rule is ~94.9% accurate,
and hits ~1.46% of all spam.  In a recent nightly mass-check run:

  1.153   1.4173   0.1151    0.925   0.73    0.89  SUBJECT_ENCODED_TWICE

So more ham seems to use encoding twice in the subject, and a little
less spam uses it.  Based on this, my guess is the generated score would
go down.

The thing to remember about rules is that they neither necessarily
look for RFC non-compliance, nor do they avoid RFC compliant mails.
They look for features that hit spam and try to avoid hitting ham.
The key there is that rule development occurs with the results people
make available.  If the people generating results don't receive ham
mails that, for instance, use multiple encodings in a Subject header,
the results won't indicate that it occurs in ham very much.

-- 
Randomly Generated Tagline:
"I protect home plate like a mormon girl on prom night."
         - Mimi on the Drew Carey show

Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice

Posted by Mark Martinec <Ma...@ijs.si>.
Kai Schaetzl wrote:
> > I just saw that a normal Ebay outbid notice hit two high-score rules. One
> > is from sare-spoof and I already contacted the maintainer. But one is in
> > the default 3.1.1 ruleset and I think this rule should get completely
> > removed or get a score of 0. It's
> > 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice

Alan Premselaar:
> This utterly wreaks havoc on just about all Japanese email, so I dropped
> the score to nearly nothing.

Agreed, this rule is completely inappropriate, it penalizes valid
encoding according to RFC 2047 and fires on any lengthier Subject
line in non-English language. It should disappear or have a
much reduced default score.

  Mark

Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice

Posted by Alan Premselaar <al...@12inch.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kai Schaetzl wrote:
> I just saw that a normal Ebay outbid notice hit two high-score rules. One 
> is from sare-spoof and I already contacted the maintainer. But one is in 
> the default 3.1.1 ruleset and I think this rule should get completely 
> removed or get a score of 0. It's
> 
> 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice
> 
> From grepping the rules it does what it says: it checks if there are two 
> B/Q encoding identifiers in the subject. Why is this scoring with 1.72 or 
> at all? This is absolutely valid Q/B encoding and actually *required* by 
> RFC if your subject line is longer than 80 (or was it 72?) characters 
> (minus the encoding, so it's actually more like a 60 raw character limit).
> This rule will hit on *lots* of non-ASCII mail and on almost all mail 
> coming from Ebay Germany.
> 
> There are also the rules SUBJECT_EXCESS_QP and SUBJECT_EXCESS_BASE64 which 
> are "similar". QP scores 0 and BASE64 scores 0.449. This is much more 
> reasonable.
> 
> Kai
> 

This utterly wreaks havoc on just about all Japanese email, so I dropped
the score to nearly nothing.

alan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEPfgmE2gsBSKjZHQRAt82AKDAY4xTmST0kaY5cje1xH1ScDajOACg6fMH
msifLKqJuv1IpudxbKGDcfQ=
=ZDQE
-----END PGP SIGNATURE-----