You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Jeff Chan <je...@surbl.org> on 2004/09/23 09:58:42 UTC
ANNOUNCE: Adding new JP list to multi.surbl.org
[Please post follow ups to the SURBL discuss list or to me.]
One of the distinct data sources currently feeding into
ws.surbl.org includes data from Joe Wein and Raymond Dijkxhoorn
with his colleagues at Prolocation. Raymond and Prolocation
are currently processing more than 300,000 potential spams per
day using Joe's jwSpamSpy server software and combining those
with Joe's own results. In addition to the data processing
software, Joe has an elaborate, thorough, and well-thought-out
set of inclusion criteria which includes age of domain
registration, manual checks, and other factors. The resulting
data are an extensive list of spam URI domains with a very
low false positive rate (hits on legitimate messages). We
are calling this resulting data JP for Joe Wein + Prolocation.
The bottom line is that JP (called PJ in the table below) has a
significantly lower false positive rate than WS while having
similar spam detection rates, for example as measured against a
large corpora set belonging to Theo Van Dinter of SpamAssassin:
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
2424443 2357143 67300 0.972 0.00 0.00 (all messages)
100.000 97.2241 2.7759 0.972 0.00 0.00 (all messages as %)
7.595 7.8122 0.0045 0.999 1.00 0.00 URIBL_SC_SURBL
76.754 78.9448 0.0178 1.000 0.80 0.00 URIBL_OB_SURBL
77.230 79.4340 0.0208 1.000 0.60 1.00 URIBL_PJ_SURBL
0.985 1.0126 0.0045 0.996 0.50 0.00 URIBL_AB_SURBL
82.119 84.4600 0.1367 0.998 0.40 0.00 URIBL_WS_SURBL
0.021 0.0216 0.0045 0.829 0.00 0.00 URIBL_PH_SURBL
So we feel the data could usefully be broken out into a
separate list which could safely be scored higher than
WS. We also continue to work on improving the False Positive
rate of WS of course. We propose making JP a separate list
within multi.surbl.org, but *not* a standalone list like
jp.surbl.org, since it's a major effort to set up entirely
new lists and most people should be using multi now.
The main reason for announcing this change ahead of time
is to allow developers of the many programs (in addition to
SpamAssassin) now using SURBL data to update their code or
configurations to take into account that the result codes in
multi will be changing as a result of adding JP. JP would get
the 64 bitmask, as in:
2 = comes from sc.surbl.org
4 = comes from ws.surbl.org
8 = comes from phishing list (labelled as [ph] in multi)
16 = comes from ob.surbl.org
32 = comes from ab.surbl.org
64 = comes from jp list
So a record in SC, WS, and JP would give a value 127.0.0.70.
One with WS, OB, and JP would resolve to 127.0.0.84, etc.
Programs using multi.surbl.org should be updated accordingly.
Since JP is currently included in WS, there will be 100%
overlap of JP entries in WS so that any record in JP will
also be in WS. In other words about half of the WS records
in multi will increase by 64 due to overlap with JP. But
WS will continue to use the 4 bit, as before. If your
programs are decoding the multi results using the bit
positions, they should need no adjustments to continue to
handle the WS data.
We hope that 5 days is not too short notice for this kind of
change.... I will try to contact the developers of the various
(non-SA) programs separately to make sure they're aware of the
coming change. Hopefully most of them are on this announcement
list however.
We were not able to get JP as a separate list in yesterday's
SpamAssassin 3.0.0 full release, but we have gotten it into
SA 3.1 development.
For now the JP data will continue to be included in WS,
but just before Spam Assassin 3.1 gets released (probably in
6 months to a year from now), we will remove JP data from WS
to make them separate lists within multi. This means that
SpamAssassin 3.0 and other current users of WS will continue
to to get the benefits of JP under their default shipping
configurations, and that JP can also be used separately by
those who modify their configurations to take advantage of it.
In summary, we will:
1. Add JP to multi.surbl.org on Monday September 27th.
(Note that like PH, JP would not be available as a separate
list, only as part of multi.)
2. Keep the JP data in WS for now, so that regular 3.0 users
get the advantages of JP also (as part of WS).
3. Ask the SpamAssassin developers to score JP separately in
SA 3.1.
4. Remove JP from WS before the final SA 3.1 mass check and
re-scoring is done, to make the two lists more separate
for 3.1 . (Note that the separation is removal of the
specific subset arrangement suggested in #2. If that is
done, there will still be some minor overlap of the records
in WS and JP.)
5. Inform people about removing JP from WS before we do it,
so existing WS users can add JP, etc.
Please post follow up questions or comments to the SURBL discuss
list or to me personally.
Thanks,
Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/
Re: REMINDER: Adding new JP list to multi.surbl.org on 9/27/04
Posted by Jeff Chan <je...@surbl.org>.
On Monday, September 27, 2004, 8:01:48 PM, Daniel Drucker wrote:
> On 2004-09-27, Jeff Chan <je...@surbl.org> wrote:
>> This is a reminder that we will be adding JP as a new list within
>> multi.surbl.org, as described in the previous announcement:
> http://www.surbl.org/quickstart.html is a little bit confusing due to
> its haphazard organization.
Yes, I'll probably rewrite it again at some point....
> Am I correct in thinking that if one wants to use the "multi" list
> under SA3.0, no action or configuration of any kind is needed? It is
> built in, and activated and positively scored by default?
Yes. However JP is a new list, so you should probably add
a rule for it. All the other SURBLs are already configured
and enabled in SA 3.0.0 by default.
Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/
Re: REMINDER: Adding new JP list to multi.surbl.org on 9/27/04 [Scanned]
Posted by Marco Maske <ma...@netcologne.de>.
David Thurman wrote:
Excuse, must you not do how in the /usr/share/spamassassin/score.cf
add this 2 lines:
> I added this to our local.cf
ifplugin Mail::SpamAssassin::Plugin::URIDNSBL
> score URIBL_JP_SURBL 4.0
endif # Mail::SpamAssassin::Plugin::URIDNSBL
> Why are we not supposed to just add it to
> /usr/share/spamassassin/25_uribl.cf? As apposed to local.cf?
>
> If not can I ask why we shouldn't (trying to understand sorry)
In this case are changes in: /usr/share/spamassassin/* the better way
because most of us want this overwritten in SA 3.1.
This rule is already default in SA 3.1 and higher, and will scored by the
masstest every Update.
--
Ciao Marco, registered GNU/Linux-User 313353
Keine Macht George W.Bush und seiner Junta zur Ausbeutung, Unterdrückung,
Weltmacht & 'BigBrother-watching'; kauft keine U$-Waren!
Re: REMINDER: Adding new JP list to multi.surbl.org on 9/27/04[Scanned]
Posted by Jeff Chan <je...@surbl.org>.
On Wednesday, September 29, 2004, 7:02:50 AM, David Thurman wrote:
> On 9/29/04 8:55 AM, "Christiaan den Besten" wrote:
>>> Why are we not supposed to just add it to
>>> /usr/share/spamassassin/25_uribl.cf? As apposed to local.cf?
>>
>> Because this file will be overwritten on the next update of SA.
> Ah so I see said the blind man. Understood, so this is something that is
> optional as far as SA is concerned.
> Thanks
For now, yes. However a rule for JP has already been added
to the SA 3.1 branch, or so I heard. :-)
I suppose that means you'll want to take out the one in
local.cf after upgrading to 3.1.
Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/
Re: REMINDER: Adding new JP list to multi.surbl.org on
9/27/04[Scanned]
Posted by David Thurman <li...@webpresencegroup.net>.
On 9/29/04 8:55 AM, "Christiaan den Besten" wrote:
>> Why are we not supposed to just add it to
>> /usr/share/spamassassin/25_uribl.cf? As apposed to local.cf?
>
> Because this file will be overwritten on the next update of SA.
Ah so I see said the blind man. Understood, so this is something that is
optional as far as SA is concerned.
Thanks
--
David Thurman
The Web Presence Group
http://www.the-presence.com
Web Development/E-Commerce/CMS/Hosting/Dedicated Servers
800-399-6441/309-679-0774
Re: REMINDER: Adding new JP list to multi.surbl.org on 9/27/04[Scanned]
Posted by Christiaan den Besten <ch...@scorpion.nl>.
> Why are we not supposed to just add it to
> /usr/share/spamassassin/25_uribl.cf? As apposed to local.cf?
Because this file will be overwritten on the next update of SA.
bye,
Chris
Re: REMINDER: Adding new JP list to multi.surbl.org on 9/27/04
[Scanned]
Posted by David Thurman <li...@webpresencegroup.net>.
On 9/27/04 10:09 PM, "Josh Trutwin" wrote:
> For me (install 3.0 from source) theyare in
> /usr/share/spamassassin/25_uribl.cf. There is a score file as well. The new
> JP list is not there, so you could add to local.cf until 3.1 released.
Jumping in late here as we just installed SA 3.0 on Debian (Works excellent
with MS THANKS!! :)
I added this to our local.cf
urirhssub URIBL_JP_SURBL multi.surbl.org. A 64
header URIBL_JP_SURBL eval:check_uridnsbl('URIBL_JP_SURBL')
describe URIBL_JP_SURBL Contains a URL listed in JP at
http://www.surbl.org/lists.html
tflags URIBL_JP_SURBL net
score URIBL_JP_SURBL 4.0
Why are we not supposed to just add it to
/usr/share/spamassassin/25_uribl.cf? As apposed to local.cf?
If not can I ask why we shouldn't (trying to understand sorry)
--
David Thurman
The Web Presence Group
http://www.the-presence.com
Web Development/E-Commerce/CMS/Hosting/Dedicated Servers
800-399-6441/309-679-0774
Re: REMINDER: Adding new JP list to multi.surbl.org on 9/27/04
Posted by Josh Trutwin <jo...@trutwins.homeip.net>.
On Tue, 28 Sep 2004 03:01:48 +0000 (UTC)
"Daniel M. Drucker" <dm...@3e.org> wrote:
> On 2004-09-27, Jeff Chan <je...@surbl.org> wrote:
> > This is a reminder that we will be adding JP as a new list within
> > multi.surbl.org, as described in the previous announcement:
>
>
> http://www.surbl.org/quickstart.html is a little bit confusing due
> to its haphazard organization.
>
> Am I correct in thinking that if one wants to use the "multi" list
> under SA3.0, no action or configuration of any kind is needed? It is
> built in, and activated and positively scored by default?
For me (install 3.0 from source) theyare in /usr/share/spamassassin/25_uribl.cf. There is a score file as well. The new JP list is not there, so you could add to local.cf until 3.1 released.
You'll forgive my verbage, I just got stung by a #$@# wasp.
Josh
Re: REMINDER: Adding new JP list to multi.surbl.org on 9/27/04
Posted by "Daniel M. Drucker" <dm...@3e.org>.
On 2004-09-27, Jeff Chan <je...@surbl.org> wrote:
> This is a reminder that we will be adding JP as a new list within
> multi.surbl.org, as described in the previous announcement:
http://www.surbl.org/quickstart.html is a little bit confusing due to
its haphazard organization.
Am I correct in thinking that if one wants to use the "multi" list
under SA3.0, no action or configuration of any kind is needed? It is
built in, and activated and positively scored by default?
--
Daniel Drucker / dmd@3e.org
REMINDER: Adding new JP list to multi.surbl.org on 9/27/04
Posted by Jeff Chan <je...@surbl.org>.
This is a reminder that we will be adding JP as a new list within
multi.surbl.org, as described in the previous announcement:
http://lists.surbl.org/pipermail/announce/2004-September/000077.html
on Monday September 27th. JP will have the bitmask value of 64,
which means about half of the WS records will have results that
increase by 64. We'll probably make the change around close of
business U.S. East Coast time or around 22:00 UTC/GMT.
For now, JP records will continue to be included in WS, but
when SpamAssassin 3.1 gets released, the JP data will come
out of WS and these two will become separate lists within
multi. Please update your programs accordingly.
SpamAssassin users won't need to make any changes to keep
using WS, but should probably add JP to their configurations
now so that they will be ready for the future change, and
also to gain the significant benefits of the separate JP
list now:
http://www.surbl.org/quickstart.html
__
jp - jwSpamSpy + Prolocation data source
Joe Wein's jwSpamSpy program is used both by Joe's own systems
and also Raymond Dijkxhoorn and his colleagues at Prolocation to
process more than 300,000 likely spams per day. The resulting
list has a very good spam detection rate around 80% and a very
low false positive rate below 0.02%. This data is only available
in the combined list multi.surbl.org.
An SA 2.63 and 2.64 rule and score using SpamCopURI 0.22 or later
looks like this:
uri JP_URI_RBL eval:check_spamcop_uri_rbl('multi.surbl.org','127.0.0.0+64')
describe JP_URI_RBL URI's domain appears in JP at http://www.surbl.org/lists.html
tflags JP_URI_RBL net
score JP_URI_RBL 4.0
An SA 3.0 rule and score using URIBL's urirhssub looks like this:
urirhssub URIBL_JP_SURBL multi.surbl.org. A 64
header URIBL_JP_SURBL eval:check_uridnsbl('URIBL_JP_SURBL')
describe URIBL_JP_SURBL Contains a URL listed in JP at http://www.surbl.org/lists.html
tflags URIBL_JP_SURBL net
score URIBL_JP_SURBL 4.0
__
JP has approximately the same spam detection and false positive
rates as OB and should probably be scored accordingly. The
data are not the same however since JP uses different data
sources and Joe Wein's processing algorithms and inclusion
policies.
Jeff C.