You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Adam Katz <an...@khopis.com> on 2011/04/15 02:56:36 UTC

One meta to rule them all (Overlapping blacklists)

On 04/11, Warren Togami Jr. wrote:
>>>> Before that rescoring, we may want to have a serious
>>>> discussion about reducing score pile-up in the case where
>>>> multiple production DNSBL's all hit at the same time.  Adam
>>>> Katz' approach is one possibility, albeit confusing to users
>>>> because users see subtractions in the score reports.  There may
>>>> be other better approaches to this.

On 04/12/2011 12:59 PM, darxus@chaosreigns.com wrote:
>>> What was Adam Katz's approach?  Not using black or white lists 
>>> just because they overlap is unfortunate.  So is the reduction of
>>> generated scores that overlap probably causes.

I have two proposals, both of which have been mentioned here in the
past.  Warren was referring to the first:


1. One meta to rule them all

This is very simple.  All it should require is removing the 'nopublish'
flag from PUBLISHED_DNSBLS (and probably renaming it to something like
"RCVD_IN_DNSBL" or "DNSBL" to avoid confusion).

This should result in a large score for the meta and therefore reduced
scores for the individual rules.  However, as the GA isn't always
rational and could miss the overlap and create dangerous scores, we
might have to manually score the meta (and/or the lookups).

This was mentioned (not for the first time) at
http://old.nabble.com/I-want-MORE-SPAM---MORE-SPAM-tt23599323.html#a23602101

It should be noted that KHOP_DNSBL_ADJ and KHOP_DNSBL_BUMP (from my
khop-bl sa-update channel) implements this as a third-party hack.  The
former's purpose is in calculating when the score has been brought too
high and then reducing it while the latter focuses on when the score
isn't high enough.  Such a hack is very very messy and happily
completely unnecessary in upstream given a rule like PUBLISHED_DNSBLS.

Also note that this process is replicated in KHOP_URIBL_ADJ and there is
a similar trick for whitelists in KHOP_RCVD_TRUST.

Since I've kept these rules out of subversion, you'll have to view the
channel itself.  I have a copy of the relevant rule file at:

http://khopis.com/sa/khop-bl/khop-bl.cf

On 04/14/2011 06:26 AM, Greg Troxel wrote:
>> I suggest adding a metarule to combine two blacklists or two
>> whitelists, and see what the existing score-generation procedure
>> gives it.  If my idea is confused, then most such metarules might
>> have near-zero scores. If one ends up with A=2 B=4 and A_and_B
>> getting -1, that validates the concept.
>> 
>> This is sort of like KHOP_DNSBL_BUMP, but letting the GA set the
>> value.

Yes, exactly my intent.  I couldn't do that on the channel without
re-scoring upstream rules, which I really didn't want to do.

On 04/14/2011 07:58 AM, John Hardin wrote:
> I'd first verify the assumption that the score generator will
> generate negative scores. I don't know that it does not, but there
> are only 56 rules with negative scores and almost all look manually
> assigned. I suspect that automatic generation of negative scores is
> intentionally suppressed to inadvertently avoid opening up "magical
> bypass" rules for spammers.

We shouldn't need negative scores.  With the adjuster in the picture, it
should get the big score and the RCVD_IN_* dependencies will have
reduced scores.  ... BIG POTENTIAL HURDLE:  users who have tweaked the
existing rules will have a very high FP risk.  The best solution is
therefore to rename everything (yuck!).

Regarding desirable negative rules ... tflags nice is a really bad idea
since this isn't a nice rule.  KHOP_DNSBL_ADJ is (probably) a unique
type of case in which a spam rule needs a negative score.

>> Perhaps Adam can explain where those scores come from - I certainly 
>> think they are a good manual guess, but it would be interesting if
>> it's more than that.

The multipliers in KHOP_DNSBL_ADJ are generated from the scores of the
rules they modify so as to approximate the total score coming from the
rules in question.  I don't keep them in perfect sync (it doesn't matter
too much unless they have a dramatic change).  As to the score for
KHOP_DNSBL_ADJ; that came from the calculated average of the message it
was hitting (some math is present in the comments) with the aspiration
of reducing the total DNSBL score below five.

KHOP_DNSBL_BUMP is matched on a similar philosophy; if a highly
trustworthy DNSBL is hit AND the combined DNSBL score isn't already too
high, it's safe to add a few points.  Its two point score itself is from
my own judgment.



(That was long enough for one email.  My second proposal, regarding a
new breed of short-circuiting that would prevent frivolous rule checks
including DNSBLs, will be sent in its own email.)