You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Jeff Chan <je...@surbl.org> on 2004/09/23 09:58:42 UTC

ANNOUNCE: Adding new JP list to multi.surbl.org

[Please post follow ups to the SURBL discuss list or to me.]

One of the distinct data sources currently feeding into
ws.surbl.org includes data from Joe Wein and Raymond Dijkxhoorn
with his colleagues at Prolocation.  Raymond and Prolocation
are currently processing more than 300,000 potential spams per
day using Joe's jwSpamSpy server software and combining those
with Joe's own results.  In addition to the data processing
software, Joe has an elaborate, thorough, and well-thought-out
set of inclusion criteria which includes age of domain
registration, manual checks, and other factors.  The resulting
data are an extensive list of spam URI domains with a very
low false positive rate (hits on legitimate messages).  We
are calling this resulting data JP for Joe Wein + Prolocation.

The bottom line is that JP (called PJ in the table below) has a
significantly lower false positive rate than WS while having
similar spam detection rates, for example as measured against a
large corpora set belonging to Theo Van Dinter of SpamAssassin:

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
2424443  2357143    67300    0.972   0.00    0.00  (all messages)
100.000  97.2241   2.7759    0.972   0.00    0.00  (all messages as %)
  7.595   7.8122   0.0045    0.999   1.00    0.00  URIBL_SC_SURBL
 76.754  78.9448   0.0178    1.000   0.80    0.00  URIBL_OB_SURBL
 77.230  79.4340   0.0208    1.000   0.60    1.00  URIBL_PJ_SURBL
  0.985   1.0126   0.0045    0.996   0.50    0.00  URIBL_AB_SURBL
 82.119  84.4600   0.1367    0.998   0.40    0.00  URIBL_WS_SURBL
  0.021   0.0216   0.0045    0.829   0.00    0.00  URIBL_PH_SURBL

So we feel the data could usefully be broken out into a
separate list which could safely be scored higher than
WS.  We also continue to work on improving the False Positive
rate of WS of course.  We propose making JP a separate list
within multi.surbl.org, but *not* a standalone list like
jp.surbl.org, since it's a major effort to set up entirely
new lists and most people should be using multi now.

The main reason for announcing this change ahead of time
is to allow developers of the many programs (in addition to
SpamAssassin) now using SURBL data to update their code or
configurations to take into account that the result codes in
multi will be changing as a result of adding JP.  JP would get
the 64 bitmask, as in: 

 2 = comes from sc.surbl.org
 4 = comes from ws.surbl.org
 8 = comes from phishing list (labelled as [ph] in multi)
16 = comes from ob.surbl.org
32 = comes from ab.surbl.org
64 = comes from jp list

So a record in SC, WS, and JP would give a value 127.0.0.70.
One with WS, OB, and JP would resolve to 127.0.0.84, etc.
Programs using multi.surbl.org should be updated accordingly.

Since JP is currently included in WS, there will be 100%
overlap of JP entries in WS so that any record in JP will
also be in WS.  In other words about half of the WS records
in multi will increase by 64 due to overlap with JP.  But
WS will continue to use the 4 bit, as before.  If your
programs are decoding the multi results using the bit
positions, they should need no adjustments to continue to
handle the WS data.

We hope that 5 days is not too short notice for this kind of
change....  I will try to contact the developers of the various
(non-SA) programs separately to make sure they're aware of the
coming change.  Hopefully most of them are on this announcement
list however. 

We were not able to get JP as a separate list in yesterday's
SpamAssassin 3.0.0 full release, but we have gotten it into
SA 3.1 development.

For now the JP data will continue to be included in WS,
but just before Spam Assassin 3.1 gets released (probably in
6 months to a year from now), we will remove JP data from WS
to make them separate lists within multi.  This means that
SpamAssassin 3.0 and other current users of WS will continue
to to get the benefits of JP under their default shipping
configurations, and that JP can also be used separately by
those who modify their configurations to take advantage of it.

In summary, we will:

1.  Add JP to multi.surbl.org on Monday September 27th.
(Note that like PH, JP would not be available as a separate
list, only as part of multi.)

2.  Keep the JP data in WS for now, so that regular 3.0 users
get the advantages of JP also (as part of WS).

3.  Ask the SpamAssassin developers to score JP separately in
SA 3.1.

4.  Remove JP from WS before the final SA 3.1 mass check and
re-scoring is done, to make the two lists more separate
for 3.1 .  (Note that the separation is removal of the
specific subset arrangement suggested in #2.  If that is
done, there will still be some minor overlap of the records
in WS and JP.)

5.  Inform people about removing JP from WS before we do it,
so existing WS users can add JP, etc.

Please post follow up questions or comments to the SURBL discuss
list or to me personally.

Thanks,

Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/


Re: REMINDER: Adding new JP list to multi.surbl.org on 9/27/04

Posted by Jeff Chan <je...@surbl.org>.
On Monday, September 27, 2004, 8:01:48 PM, Daniel Drucker wrote:
> On 2004-09-27, Jeff Chan <je...@surbl.org> wrote:
>> This is a reminder that we will be adding JP as a new list within
>> multi.surbl.org, as described in the previous announcement:

> http://www.surbl.org/quickstart.html is a little bit confusing due to
> its haphazard organization.

Yes, I'll probably rewrite it again at some point....

> Am I correct in thinking that if one wants to use the "multi" list
> under SA3.0, no action or configuration of any kind is needed? It is
> built in, and activated and positively scored by default?

Yes.  However JP is a new list, so you should probably add
a rule for it.  All the other SURBLs are already configured
and enabled in SA 3.0.0 by default.

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/


Re: REMINDER: Adding new JP list to multi.surbl.org on 9/27/04 [Scanned]

Posted by Marco Maske <ma...@netcologne.de>.
David Thurman wrote:

Excuse, must you not do how in the /usr/share/spamassassin/score.cf

add this 2 lines:
> I added this to our local.cf

ifplugin Mail::SpamAssassin::Plugin::URIDNSBL
> score URIBL_JP_SURBL    4.0
endif # Mail::SpamAssassin::Plugin::URIDNSBL


> Why are we not supposed to just add it to
> /usr/share/spamassassin/25_uribl.cf? As apposed to local.cf?
>
> If not can I ask why we shouldn't (trying to understand sorry)

In this case are changes in: /usr/share/spamassassin/* the better way 
because most of us want this overwritten in SA 3.1.

This rule is already default in SA 3.1 and higher, and will scored by the 
masstest every Update.

-- 
Ciao Marco, registered GNU/Linux-User 313353

Keine Macht George W.Bush und seiner Junta zur Ausbeutung, Unterdrückung,
Weltmacht & 'BigBrother-watching'; kauft keine U$-Waren!

Re: REMINDER: Adding new JP list to multi.surbl.org on 9/27/04[Scanned]

Posted by Jeff Chan <je...@surbl.org>.
On Wednesday, September 29, 2004, 7:02:50 AM, David Thurman wrote:
> On 9/29/04 8:55 AM, "Christiaan den Besten" wrote:

>>> Why are we not supposed to just add it to
>>> /usr/share/spamassassin/25_uribl.cf? As apposed to local.cf?
>> 
>> Because this file will be overwritten on the next update of SA.

> Ah so I see said the blind man. Understood, so this is something that is
> optional as far as SA is concerned.

> Thanks

For now, yes.  However a rule for JP has already been added
to the SA 3.1 branch, or so I heard.  :-)

I suppose that means you'll want to take out the one in
local.cf after upgrading to 3.1.

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/


Re: REMINDER: Adding new JP list to multi.surbl.org on 9/27/04[Scanned]

Posted by David Thurman <li...@webpresencegroup.net>.
On 9/29/04 8:55 AM, "Christiaan den Besten" wrote:

>> Why are we not supposed to just add it to
>> /usr/share/spamassassin/25_uribl.cf? As apposed to local.cf?
> 
> Because this file will be overwritten on the next update of SA.

Ah so I see said the blind man. Understood, so this is something that is
optional as far as SA is concerned.

Thanks
-- 
David Thurman
The Web Presence Group
http://www.the-presence.com
Web Development/E-Commerce/CMS/Hosting/Dedicated Servers
800-399-6441/309-679-0774


Re: REMINDER: Adding new JP list to multi.surbl.org on 9/27/04[Scanned]

Posted by Christiaan den Besten <ch...@scorpion.nl>.
> Why are we not supposed to just add it to
> /usr/share/spamassassin/25_uribl.cf? As apposed to local.cf?

Because this file will be overwritten on the next update of SA.

bye,
Chris


Re: REMINDER: Adding new JP list to multi.surbl.org on 9/27/04 [Scanned]

Posted by David Thurman <li...@webpresencegroup.net>.
On 9/27/04 10:09 PM, "Josh Trutwin" wrote:

> For me (install 3.0 from source) theyare in
> /usr/share/spamassassin/25_uribl.cf.  There is a score file as well.  The new
> JP list is not there, so you could add  to local.cf until 3.1 released.

Jumping in late here as we just installed SA 3.0 on Debian (Works excellent
with MS THANKS!! :)

I added this to our local.cf

urirhssub URIBL_JP_SURBL  multi.surbl.org.        A   64
header    URIBL_JP_SURBL  eval:check_uridnsbl('URIBL_JP_SURBL')
describe  URIBL_JP_SURBL  Contains a URL listed in JP at
http://www.surbl.org/lists.html
tflags    URIBL_JP_SURBL  net

score URIBL_JP_SURBL    4.0

Why are we not supposed to just add it to
/usr/share/spamassassin/25_uribl.cf? As apposed to local.cf?

If not can I ask why we shouldn't (trying to understand sorry)
-- 
David Thurman
The Web Presence Group
http://www.the-presence.com
Web Development/E-Commerce/CMS/Hosting/Dedicated Servers
800-399-6441/309-679-0774


Re: REMINDER: Adding new JP list to multi.surbl.org on 9/27/04

Posted by Josh Trutwin <jo...@trutwins.homeip.net>.
On Tue, 28 Sep 2004 03:01:48 +0000 (UTC)
"Daniel M. Drucker" <dm...@3e.org> wrote:

> On 2004-09-27, Jeff Chan <je...@surbl.org> wrote:
> > This is a reminder that we will be adding JP as a new list within
> > multi.surbl.org, as described in the previous announcement:
> 
> 
> http://www.surbl.org/quickstart.html is a little bit confusing due
> to its haphazard organization.
> 
> Am I correct in thinking that if one wants to use the "multi" list
> under SA3.0, no action or configuration of any kind is needed? It is
> built in, and activated and positively scored by default?

For me (install 3.0 from source) theyare in /usr/share/spamassassin/25_uribl.cf.  There is a score file as well.  The new JP list is not there, so you could add  to local.cf until 3.1 released.

You'll forgive my verbage, I just got stung by a #$@# wasp.

Josh

Re: REMINDER: Adding new JP list to multi.surbl.org on 9/27/04

Posted by "Daniel M. Drucker" <dm...@3e.org>.
On 2004-09-27, Jeff Chan <je...@surbl.org> wrote:
> This is a reminder that we will be adding JP as a new list within
> multi.surbl.org, as described in the previous announcement:


http://www.surbl.org/quickstart.html is a little bit confusing due to
its haphazard organization.

Am I correct in thinking that if one wants to use the "multi" list
under SA3.0, no action or configuration of any kind is needed? It is
built in, and activated and positively scored by default?


-- 
Daniel Drucker / dmd@3e.org


REMINDER: Adding new JP list to multi.surbl.org on 9/27/04

Posted by Jeff Chan <je...@surbl.org>.
This is a reminder that we will be adding JP as a new list within
multi.surbl.org, as described in the previous announcement:

  http://lists.surbl.org/pipermail/announce/2004-September/000077.html

on Monday September 27th.  JP will have the bitmask value of 64,
which means about half of the WS records will have results that
increase by 64.  We'll probably make the change around close of
business U.S. East Coast time or around 22:00 UTC/GMT.

For now, JP records will continue to be included in WS, but
when SpamAssassin 3.1 gets released, the JP data will come
out of WS and these two will become separate lists within
multi.  Please update your programs accordingly.

SpamAssassin users won't need to make any changes to keep
using WS, but should probably add JP to their configurations
now so that they will be ready for the future change, and
also to gain the significant benefits of the separate JP
list now:

  http://www.surbl.org/quickstart.html
__

jp - jwSpamSpy + Prolocation data source

Joe Wein's jwSpamSpy program is used both by Joe's own systems
and also Raymond Dijkxhoorn and his colleagues at Prolocation to
process more than 300,000 likely spams per day. The resulting
list has a very good spam detection rate around 80% and a very
low false positive rate below 0.02%. This data is only available
in the combined list multi.surbl.org. 

An SA 2.63 and 2.64 rule and score using SpamCopURI 0.22 or later
looks like this: 

uri       JP_URI_RBL  eval:check_spamcop_uri_rbl('multi.surbl.org','127.0.0.0+64')
describe  JP_URI_RBL  URI's domain appears in JP at http://www.surbl.org/lists.html
tflags    JP_URI_RBL  net

score     JP_URI_RBL  4.0

An SA 3.0 rule and score using URIBL's urirhssub looks like this:

urirhssub URIBL_JP_SURBL  multi.surbl.org.        A   64
header    URIBL_JP_SURBL  eval:check_uridnsbl('URIBL_JP_SURBL')
describe  URIBL_JP_SURBL  Contains a URL listed in JP at http://www.surbl.org/lists.html
tflags    URIBL_JP_SURBL  net

score URIBL_JP_SURBL    4.0
__

JP has approximately the same spam detection and false positive
rates as OB and should probably be scored accordingly.  The
data are not the same however since JP uses different data
sources and Joe Wein's processing algorithms and inclusion
policies.

Jeff C.