You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Alex <my...@gmail.com> on 2010/04/17 21:30:51 UTC
Re: cleanup for DNSBLs
Hi Adam,
Some time ago you posted that you were investigating the stats and
effectiveness of a few rules in your masschecks sandbox, and thought I
would see if you had made any progress, and found anything helpful?
Posted below...
Thanks,
Alex
On Mon, Nov 23, 2009 at 8:34 PM, Adam Katz <an...@khopis.com> wrote:
> Unless there are objections, I'm going to add two tests to my sandbox:
>
> RCVD_IN_NIX_SPAM, a new (to us) DNSBL populated by the same source as
> the original [N]iXhash zone, with results on intra2net that look quite
> promising: 72.98:0.12 spam:ham (PSBL has 48.69:0.36),
> http://www.intra2net.com/en/support/antispam/blacklist.php_dnsbl=RCVD_IN_NIX_SPAM.html
>
> RCVD_IN_SPAMCOP, a fix-up of SpamCop to limit it to the last external
> relay (just like every other DNSBL used by SpamAssassin).
>
> While digging around there, I noticed that SpamCop and ham rule
> RCVD_IN_BSP_TRUSTED are the only rules to use check_rbl_txt(), which
> affords it a nicer explanation of what triggered the spam. For a
> fully apples-to-apples comparison, my fix-up reverts back to plain-old
> check_rbl() ... which unfortunately means a second DNS lookup (since
> we're looking for an A record rather than a TXT record).
>
> Both will be marked "nopublish" until we have stats to motivate us.
>
>
> check_rbl_txt() gives quite informative data, and it's supported by
> every DNSBL I've tried (all below). RCVD_IN_NIX_SPAM supports it
> (though my test will avoid it until we can determine there isn't a bug
> in lookups here), as do BRBL and others. Assuming a lack of bugs or
> efficiency, we should probably use it for any index that doesn't
> contain multiple indices (like zen).
>
> Examples:
>
> $ host -t txt 11.70.132.91.ix.dnsbl.manitu.net.
> 11.70.132.91.ix.dnsbl.manitu.net descriptive text "Spam sent to the
> mailhost mx.selfip.biz was detected by NiX Spam at Mon, 23 Nov 2009
> 23:31:24 +0100, see
> http://www.dnsbl.manitu.net/lookup.php?value=91.132.70.11"
> $ host -t txt 11.70.132.91.bb.barracudacentral.org
> 11.70.132.91.bb.barracudacentral.org descriptive text
> "http://www.barracudanetworks.com/reputation/?pr=1&ip=91.132.70.11"
> $ host -t txt 11.70.132.91.bl.spamcop.net. Mon 23 19:24:48
> 11.70.132.91.bl.spamcop.net descriptive text "Blocked - see
> http://www.spamcop.net/bl.shtml?91.132.70.11"
> $ host -t txt 11.70.132.91.psbl.surriel.com. [1] 19:32:04
> 11.70.132.91.psbl.surriel.com descriptive text "Listed in PSBL, see
> http://psbl.surriel.com/listing?ip=91.132.70.11"
> $ host -t txt 11.70.132.91.bl.spameatingmonkey.net.
> 11.70.132.91.bl.spameatingmonkey.net descriptive text "listed, see
> http://spameatingmonkey.com/lookup/91.132.70.11"
>
> (If you're wondering, that IP is listed as the #1 offender by spamcop,
> so it hits all of them. 127.0.0.2 gives inaccurate responses since it
> is a test and often is called that.)
>
Re: cleanup for DNSBLs
Posted by Alex <my...@gmail.com>.
Hi Adam,
>> Some time ago you posted that you were investigating the stats and
>> effectiveness of a few rules in your masschecks sandbox, and thought
>> I would see if you had made any progress, and found anything
>> helpful?
>
> Yeah, analysis (and writing it up) is time-consuming and I was putting
> it off. Here it is.
Thanks for the info. Hope to see further analysis of your efforts in the future.
Best,
Alex
Re: cleanup for DNSBLs
Posted by Adam Katz <an...@khopis.com>.
On 04/17/2010 03:30 PM, Alex wrote:
> Some time ago you posted that you were investigating the stats and
> effectiveness of a few rules in your masschecks sandbox, and thought
> I would see if you had made any progress, and found anything
> helpful?
Yeah, analysis (and writing it up) is time-consuming and I was putting
it off. Here it is.
> On Mon, Nov 23, 2009 at 8:34 PM, Adam Katz <an...@khopis.com> wrote:
>> Unless there are objections, I'm going to add two tests to my sandbox:
>>
>> RCVD_IN_NIX_SPAM, a new (to us) DNSBL populated by the same source as
>> the original [N]iXhash zone, with results on intra2net that look quite
>> promising: 72.98:0.12 spam:ham (PSBL has 48.69:0.36),
>> http://www.intra2net.com/ [...]
DateRev SPAM% HAM% S/O RANK NAME
20091219 6.0855 0.0158 0.997 0.91 T_RCVD_IN_NIX_SPAM
20091226 6.6822 0.0171 0.997 0.91 T_RCVD_IN_NIX_SPAM
20100116 8.8194 0.0079 0.999 0.93 T_RCVD_IN_NIX_SPAM
20100123 9.6367 0.0060 0.999 0.94 T_RCVD_IN_NIX_SPAM
Here are all the results ruleqa was willing to yield. I've removed the
cases where there weren't about a million spams as the data for most
rules is non-representative. After January, ruleqa stopped evaluating
the rule (and RCVD_IN_SPAMCOP) altogether, so I'm not confident in the
results as they never leveled out.
Based on that performance, NiX performs quite well, but not at a level
to justify including in SA proper as it just creates too much DNS traffic.
Jari Fredricksson's recent Top "Ten Rules" post to the list has
RCVD_IN_NIX_SPAM ranked 11th (he posted 20 rules, "Ten" was in the
thread name) with 72.29% spam versus 16% ham at 0.998 S/O (total
ham+spam corpus = 20293). Jari is in NE Europe, like this DNSBL's
spamtrap fodder. My company gets over 17.6% spam on Nix as well.
>> RCVD_IN_SPAMCOP, a fix-up of SpamCop to limit it to the last
>> external relay (just like every other DNSBL used by SpamAssassin).
This again only found four useful trials. The results show that SpamCop
is indeed a well-maintained DNSBL with a very low FP rate, but it
doesn't have the sheer volume of the others.
DateRev SPAM% HAM% S/O RANK NAME
20091219 11.9204 0.0390 0.997 0.89 T_RCVD_IN_SPAMCOP
20091226 10.4777 0.0367 0.997 0.88 T_RCVD_IN_SPAMCOP
20100116 12.2375 0.0953 0.992 0.81 T_RCVD_IN_SPAMCOP
20100123 13.7493 0.0324 0.998 0.90 T_RCVD_IN_SPAMCOP
Compared to the full parsing of headers:
DateRev SPAM% HAM% S/O RANK NAME
20091219 57.4236 1.8637 0.969 0.62 RCVD_IN_BL_SPAMCOP_NET
20091226 57.1671 1.7706 0.970 0.62 RCVD_IN_BL_SPAMCOP_NET
20100116 58.6552 1.7156 0.972 0.62 RCVD_IN_BL_SPAMCOP_NET
20100123 59.0184 1.6012 0.974 0.62 RCVD_IN_BL_SPAMCOP_NET
... it would be a shame to strike spamcop, but it doesn't really seem
like much of a player (because it doesn't use spamtraps). In fact, it's
lack of spamtraps suggests keeping it because it's capable of listing
spammers that successfully avoid spamtraps. Maybe I'll open a bug to
use the lastexternal version instead of the current one.
>> While digging around there, I noticed that SpamCop and ham rule
>> RCVD_IN_BSP_TRUSTED are the only rules to use check_rbl_txt(),
>> which affords it a nicer explanation of what triggered the spam.
>> For a fully apples-to-apples comparison, my fix-up reverts back to
>> plain-old check_rbl() ... which unfortunately means a second DNS
>> lookup (since we're looking for an A record rather than a TXT
>> record).
>>
>> Both will be marked "nopublish" until we have stats to motivate
>> us.
>>
>> check_rbl_txt() gives quite informative data, and it's supported
>> by every DNSBL I've tried (all below). RCVD_IN_NIX_SPAM supports
>> it (though my test will avoid it until we can determine there isn't
>> a bug in lookups here), as do BRBL and others. Assuming a lack of
>> bugs or efficiency, we should probably use it for any index that
>> doesn't contain multiple indices (like zen).
I have no news on this front. That was more meant to be a question to
the other developers. I suppose the TXT data is more verbose and
therefore eats more bandwidth, so therefore SA doesn't use it?