You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Daniel Quinlan <qu...@pathname.com> on 2004/04/11 09:29:43 UTC

Re: Please sanity check sa.surbl.org announcement

Jeff Chan <je...@surbl.org> writes:

> Can any 3.0 guys able to comment if I got the urirhsbl syntax correct:

It's correct, but you might not need to get it correct because the rule
will likely ship with 3.0 when it is released if it seems to work well
and it helps.

I have a concern about the rule.  Bill Stern's SpamAssassin blacklist is
a blacklist *for* SpamAssassin, so I think the naming of your rule and
the DNSBL name (sa.surbl.org) are unintentionally a bit misleading since
the list is not maintained by the ASF or SpamAssassin.  I think it would
be a good idea to rename the DNSBL and the rule to make this clearer.
Maybe we should encourage (or help) Bill Stern to pick a snappy name.
:-)

Also, it would be better from our perspective if we could get multiple
RBL results from a single query to reduce overhead.  Any of multiple A
(like NJABL, SBL/XBL, or SORBS), bitmask A (like OPM or RBL+), or
multiple TXT (like SBL/XBL) would probably not be too hard to support
(Justin?).

> P.S. If we can get some 3.0 developers on discuss@lists.surbl.org,
> perhaps we can take the talk there.

Carbon-coping spamassassin-dev@incubator.apache.org (which is public)
for SpamAssassin issues (all versions) is probably the easiest way to
get SpamAssassin developers involved in a discussion.

Daniel

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting

Re: Please sanity check sa.surbl.org announcement

Posted by Daniel Quinlan <qu...@pathname.com>.
Jeff Chan <je...@surbl.org> writes:

> Dooh, I think I just realized you were asking whether multiple
> RBLs could be supported from a single query inside SA, and not
> whether we could combine the lists on the RBL side.  Pardon my
> misinterpretation if that was your original meaning.

Yes, that is what I was asking.  You could continue to support separate
zones as well for people who want to mirror them.

You seem to still have a tendency to reject ideas before reading them
through.  I'm not saying I've never made the same mistake, but this is
getting old.

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting

Re: Please sanity check sa.surbl.org announcement

Posted by Jeff Chan <je...@surbl.org>.
On Sunday, April 11, 2004, 12:29:43 AM, Daniel Quinlan wrote:
> Also, it would be better from our perspective if we could get multiple
> RBL results from a single query to reduce overhead.  Any of multiple A
> (like NJABL, SBL/XBL, or SORBS), bitmask A (like OPM or RBL+), or
> multiple TXT (like SBL/XBL) would probably not be too hard to support
> (Justin?).

Dooh, I think I just realized you were asking whether multiple
RBLs could be supported from a single query inside SA, and not
whether we could combine the lists on the RBL side.  Pardon my
misinterpretation if that was your original meaning.

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://www.surbl.org/


Re: Please sanity check sa.surbl.org announcement

Posted by Daniel Quinlan <qu...@pathname.com>.
Jeff Chan <je...@surbl.org> writes:

> Would you please do a spam and ham corpora check with
> "sa.surbl.org" whenever you can?  We'd really like to know any
> false positives to remove if that's possible to determine.

Pretty high FPs.

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
  11189     1200     9989    0.107   0.00    0.00  (all messages)
100.000  10.7248  89.2752    0.107   0.00    0.00  (all messages as %)
  6.095  56.2500   0.0701    0.999   1.00    1.00  URIBL_SC_SURBL
  6.855  59.7500   0.5006    0.992   0.98    1.00  URIBL_SBL
  9.545  72.8333   1.9421    0.974   0.95    0.01  T_URIBL_SA_SURBL
  0.116   0.5000   0.0701    0.877   0.58    0.01  T_URIBL_DSBL

The FP rate is higher than even SBL (which gets some collateral damage
with URIBL due to the domain->NS->A method that focuses in on name
servers).  A few of my FPs are actually legitimate anti-spam domains.

> We'd really like to know any false positives to remove if that's
> possible to determine.

Since the list is not regenerated every 4 days, I'm not sure it's a good
idea for SA corpus maintainers to submit false positives since the FP
rate would then be lower for us, but not much lower for most people.  In
other words, it would introduce a large corpus bias.

> We deliberately did not want to combine Bill's list and mine not
> so much due to not-invented-here syndrome but because their
> source data is so different, and because their size and time
> factors are pretty radically different at present.  I gave some
> of the original reasons in the proposed announcement which I had
> not forwarded here yet, but will now.

You could continue to offer separate queries for people who are
mirroring the zones.  A lot of blacklists offer both separate and
multiple queries.

Daniel
 
-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting

Re: Please sanity check sa.surbl.org announcement

Posted by Jeff Chan <je...@surbl.org>.
On Sunday, April 11, 2004, 12:29:43 AM, Daniel Quinlan wrote:
> Jeff Chan <je...@surbl.org> writes:

>> Can any 3.0 guys able to comment if I got the urirhsbl syntax correct:

> It's correct, but you might not need to get it correct because the rule
> will likely ship with 3.0 when it is released if it seems to work well
> and it helps.

Let's hope that proves to be the case.  :-)

Would you please do a spam and ham corpora check with
"sa.surbl.org" whenever you can?  We'd really like to know any
false positives to remove if that's possible to determine.

> I have a concern about the rule.  Bill Stern's SpamAssassin blacklist is
> a blacklist *for* SpamAssassin, so I think the naming of your rule and
> the DNSBL name (sa.surbl.org) are unintentionally a bit misleading since
> the list is not maintained by the ASF or SpamAssassin.  I think it would
> be a good idea to rename the DNSBL and the rule to make this clearer.
> Maybe we should encourage (or help) Bill Stern to pick a snappy name.
> :-)

Fair enough.  I picked sa.surbl.org out of thin air since his rule
started with "sa" and to give oblique credit to the original
motivation for his rule, if not the precise source.

Bill, anyone, got some better names?  I prefer two letters.
(sa and sc were kind of confusing anyway.)   "sb" for
Sa-Blacklist anyone?  LOL!  ;-)

> Also, it would be better from our perspective if we could get multiple
> RBL results from a single query to reduce overhead.  Any of multiple A
> (like NJABL, SBL/XBL, or SORBS), bitmask A (like OPM or RBL+), or
> multiple TXT (like SBL/XBL) would probably not be too hard to support
> (Justin?).

We deliberately did not want to combine Bill's list and mine not
so much due to not-invented-here syndrome but because their
source data is so different, and because their size and time
factors are pretty radically different at present.  I gave some
of the original reasons in the proposed announcement which I had
not forwarded here yet, but will now.

>> P.S. If we can get some 3.0 developers on discuss@lists.surbl.org,
>> perhaps we can take the talk there.

> Carbon-coping spamassassin-dev@incubator.apache.org (which is public)
> for SpamAssassin issues (all versions) is probably the easiest way to
> get SpamAssassin developers involved in a discussion.

OK Cross-posting it is...  :-|  LOL!

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://www.surbl.org/