You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Florin Andrei <fl...@andrei.myip.org> on 2005/04/06 20:29:51 UTC

it's getting worse again

I'm using SA since... well, a long time ago, and one thing that i
noticed was a pattern in the way its efficiency varies: it's pretty good
soon after a new release, then it gets continuously worse; then a new
release and all of a sudden it's good again, then it starts "decaying"
again...

Well, it's been a while since the last release, and it's already
noticeably worse. I know this has been discussed before, i am aware of
the VirusScannerTypeUpdates FAQ entry, but you know what, from an end-
user's point of view, it does not matter. All that matters is that,
despite brilliant technical discussions, the efficiency is going down
and, if a new version is not released soon enough, the users start to
complain. This is what's happening right now.

I guess something has to change. "Then change it yourself" type of
advices will go straight to /dev/null, thank you, because as far as SA
is concerned, i'm just a user. I am merely pointing out the problem.

-- 
Florin Andrei

http://florin.myip.org/


Re: it's getting worse again

Posted by Robert Menschel <Ro...@Menschel.net>.
Hello Florin,

Wednesday, April 6, 2005, 11:29:51 AM, you wrote:

FA> I'm using SA since... well, a long time ago, and one thing that i
FA> noticed was a pattern in the way its efficiency varies: it's pretty good
FA> soon after a new release, then it gets continuously worse; then a new
FA> release and all of a sudden it's good again, then it starts "decaying"
FA> again...

FA> Well, it's been a while since the last release, and it's already
FA> noticeably worse. I know this has been discussed before, i am aware of
FA> the VirusScannerTypeUpdates FAQ entry, but you know what, from an end-
FA> user's point of view, it does not matter. All that matters is that,
FA> despite brilliant technical discussions, the efficiency is going down
FA> and, if a new version is not released soon enough, the users start to
FA> complain. This is what's happening right now.

FA> I guess something has to change. "Then change it yourself" type of
FA> advices will go straight to /dev/null, thank you, because as far as SA
FA> is concerned, i'm just a user. I am merely pointing out the problem.

That's one of the goals of SARE, to provide useful rule updates to
keep SpamAssassin's performance high even late in the cycle between
releases.

IMO we do very well.  My systems are still running 99.9% accurate at
this date (processing about 50k emails a week, 50/50 ham/spam).

To benefit from this work, you need to be able to judiciously apply
SARE updates whenever they come out.

Bob Menschel




RE: it's getting worse again

Posted by Tim Donahue <td...@haynes-group.com>.
On Wed, 2005-04-06 at 15:45 -0400, David Brodbeck wrote:
> OT: While it's not necessary to be an expert programmer to be a system
> administrator, you'll end up doing a lot of extra work if you don't have at
> least some minimal programming skills.  One of the joys of UNIX system
> administration is being able to write scripts to automate repetitive tasks.
> 
> On the other hand, I've known many expert programmers who made absolutely
> awful system administrators.  Places that require a CS degree of system
> administrator applicants have always puzzled me a little, for that reason.
> 

I'm going to have to agree with you on this last point, I know several
IT Directors / Company Owners that have banned their programmers from
the server room and doing administrative tasks to the servers.

The reason, the programmers don't always stop and consider who else
their actions are going to affect.  Whether it is because they don't
know what services outside of their's are hosted on the server, or just
absentminded-ness.  

On the other hand I know of some programmers who are excellent
administrators as well, so it really depends upon the person more than
what the job the perform (or the title they hold).

Tim Donahue

[OT] Re: it's getting worse again

Posted by Niek <ni...@asbak.coding-slaves.com>.
On 4/6/2005 9:45 PM +0100, David Brodbeck wrote:
>>>Users should complain at their systems administrators.
>>>
>>>Niek
>>
>>Someone can be a sysadmin, and not be a programmer.  While the skill 
>>sets overlap, they're not necessarily one and the same.  Perhaps he 
>>meant user as in consumer? -Don
> 
> 
> I assumed that's what he meant, personally.
> 
> 
> OT: While it's not necessary to be an expert programmer to be a system
> administrator, you'll end up doing a lot of extra work if you don't have at
> least some minimal programming skills.  One of the joys of UNIX system
> administration is being able to write scripts to automate repetitive tasks.

Yeah, kind of.
System administrators usually manage to use google for problems they are faced with.
Basically it's the following skill: knowing /where/ to search for possible answers.
(rules du jour, ect, ect would probably be an answer/solution for the OP)

The OP demonstrated he lacked this skill, so he should contact his systems administrator.

Niek
--

RE: it's getting worse again

Posted by David Brodbeck <gu...@gull.us>.
On Wed, 6 Apr 2005 15:08:31 -0400, Don Levey wrote
> Niek wrote:
> > On 4/6/2005 8:29 PM +0100, Florin Andrei wrote:
> >> I guess something has to change. "Then change it yourself" type of
> >> advices will go straight to /dev/null, thank you, because as far as
> >> SA is concerned, i'm just a user. I am merely pointing out the
> >> problem.
> >
> > Users should complain at their systems administrators.
> >
> > Niek
> 
> Someone can be a sysadmin, and not be a programmer.  While the skill 
> sets overlap, they're not necessarily one and the same.  Perhaps he 
> meant user as in consumer? -Don

I assumed that's what he meant, personally.


OT: While it's not necessary to be an expert programmer to be a system
administrator, you'll end up doing a lot of extra work if you don't have at
least some minimal programming skills.  One of the joys of UNIX system
administration is being able to write scripts to automate repetitive tasks.

On the other hand, I've known many expert programmers who made absolutely
awful system administrators.  Places that require a CS degree of system
administrator applicants have always puzzled me a little, for that reason.


RE: it's getting worse again

Posted by Don Levey <sp...@the-leveys.us>.
Niek wrote:
> On 4/6/2005 8:29 PM +0100, Florin Andrei wrote:
>> I guess something has to change. "Then change it yourself" type of
>> advices will go straight to /dev/null, thank you, because as far as
>> SA is concerned, i'm just a user. I am merely pointing out the
>> problem.
>
> Users should complain at their systems administrators.
>
> Niek

Someone can be a sysadmin, and not be a programmer.  While the skill sets
overlap, they're not necessarily one and the same.  Perhaps he meant user as
in consumer?
 -Don

Re: it's getting worse again

Posted by Niek <ni...@asbak.coding-slaves.com>.
On 4/6/2005 8:29 PM +0100, Florin Andrei wrote:
> I guess something has to change. "Then change it yourself" type of
> advices will go straight to /dev/null, thank you, because as far as SA
> is concerned, i'm just a user. I am merely pointing out the problem.

Users should complain at their systems administrators.

Niek
--

Re: it's getting worse again

Posted by Martin Hepworth <ma...@solid-state-logic.com>.
Florin

Depends on how well it's setup in the first place. The default ruleset 
are a pretty good starting point, but I find I need to add quite a few 
extra ones in from www.rulesemporium.com etc in order to get a reason 
catch rate.

the URI-RBL from surbl.org has help tremendously in providing a more 
automatic update system and the (my_)rules_du_jour from the SARE gang at 
rulesemporium.com helps with their rules.

 From what I hear of the developers they have been discussing ways of 
providing an auto-update mechanism, but they are trying to lock it down 
from what I see, so you know the updates are from them and not spoofed.

--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300


Florin Andrei wrote:
> I'm using SA since... well, a long time ago, and one thing that i
> noticed was a pattern in the way its efficiency varies: it's pretty good
> soon after a new release, then it gets continuously worse; then a new
> release and all of a sudden it's good again, then it starts "decaying"
> again...
> 
> Well, it's been a while since the last release, and it's already
> noticeably worse. I know this has been discussed before, i am aware of
> the VirusScannerTypeUpdates FAQ entry, but you know what, from an end-
> user's point of view, it does not matter. All that matters is that,
> despite brilliant technical discussions, the efficiency is going down
> and, if a new version is not released soon enough, the users start to
> complain. This is what's happening right now.
> 
> I guess something has to change. "Then change it yourself" type of
> advices will go straight to /dev/null, thank you, because as far as SA
> is concerned, i'm just a user. I am merely pointing out the problem.
> 

**********************************************************************

This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.	

**********************************************************************


Re: Re[2]: it's getting worse again

Posted by Florin Andrei <fl...@andrei.myip.org>.
On Thu, 2005-04-07 at 11:24 -0400, Kevin Sullivan wrote:

> If you think about this, it isn't surprising.  At the time the mass-checks 
> ran for 3.0.2, the 3.0.2 rules caught almost all of the spam of that time. 
> (If they didn't, then people wrote rules until they did.)  So dynamic 
> systems like Bayes and SURBL didn't add much. and thus scored low.  This 
> will be the case during every release.
> But now, between releases, spammers write spam which evades the "standard" 
> rules.  Sure, there will be new "standard" rules for the next SA release, 
> but until then dynamic systems like Bayes and SURBL are all that are 
> catching some spam.

This looks like the smoking gun.

Wouldn't make sense, then, to pre-emptively bump up the scores for the
dynamic systems right from the beginning? They will be slightly over-
rated at first, become perfectly rated after a while, then start getting
under-rated as more time passes.
That will certainly be better than the current situation, when the
dynamic rules are perfectly rated only at first, but immediately they
start to "decay" (well, not really, but i'm trying to find a concise
metaphor).

Or how about this:
In the big SA config file, add a parameter that controls the overall
weight of the static versus the dynamic things. Release the next SA
version with that parameter set so that it gives more importance to the
static systems. Tell users (in a prominent, obvious, even intrusive
fashion!) to adjust that parameter every now and then, to give more
importance to the dynamic systems as time passes.
Or even, heck, make SA track time and automatically increase the
importance of the dynamic systems as time passes. That will make it, of
course, one of those scary self-adaptive systems that pull the carpet
from under sysadmin's feet :-) but if it stops the spam, then who cares.

I believe you're right, that's what's causing me problems - the spammers
started to learn the static rules and are evading them. Well, if that's
true, then SA must provide a mechanism to control that. An overall
static-vs-dynamic "balance button" might be a good idea. Or not. <shrug>

I will try to bump up the Bayes rules and see where that goes.

Thanks everyone.

-- 
Florin Andrei

http://florin.myip.org/


Re[2]: it's getting worse again

Posted by Robert Menschel <Ro...@Menschel.net>.
Hello Florin,

Wednesday, April 6, 2005, 5:40:10 PM, you wrote:

FA> So what is the reason why BAYES_99 is scored so low?

The algorithm/process that determines scores came out with a low score
like that.  It seemed a good bet for this new version.  Many of us
have decided that it wasn't, and we've increased the score for it in
our systems.

One of the strengths of SA is its flexibility -- if you want to be
more aggressive, raise the score(s) or lower the threshold. If you
want to be more conservative, lower the score(s) or raise the
threshold. SA's scores are aimed at a fairly conservative target,
since false positives are horrendously worse than false negatives.

>> I'm a little puzzled what you're asking for, then;  addon rulesets are
>> available from SARE, and somewhere there's a tool to automatically check
>> for updates on those rules.

FA> My impression, after a quick perusal, was that any mentions about SARE
FA> and the like are pretty well "hidden" on the SA main website.
FA> Yes, there is a mention, but there's a big fat "Use at your own risk"
FA> warning at the top of the page. What would a new user think?

A new user should think, and think twice, before using SARE rules.
A new user that doesn't read this list probably should think three or
four times before using SARE rules, and should do so slowly and
carefully, if at all, and only after reading the documentation within
those rules files.

A user who has read this list and sees how many people use SARE rules,
should also be capable of looking at the documentation within those
rules and deciding which ones might be worth trying. (And should
probably do so slowly and carefully anyway.)

>> If you're really not interested in tweaking your SA setup

FA> I've a fairly demanding job, i've a few pretty convoluted personal
FA> projects i'm involved in, i've a family and other details that typically
FA> show up if one is not an archetypal pale-faced geek-in-the-basement.
FA> I do try to take care of my personal webserver (to which i'm the sole
FA> admin), mailserver (SA, Postfix, Cyrus, Squirrelmail), VoIP PBX, etc.,
FA> despite the schedule overload.

FA> And these days i was looking at SA and i'm, like, "it's not gonna
FA> happen, i don't have time for this." I chose to play the dumb user on
FA> purpose, just because i can't fix everything myself.

I hold down a 50-60 hour work week, family, volunteer time for NPOs,
plus personal interests, and still find time to fight spam via SARE,
because it's that important to me. Personal preference.

If you don't want to spend the time required to tweak SA to a high
enough performance (you probably don't need the 99.9% accuracy I
want), the you can buy someone else's package and let them worry about
the tweaking.

There's a balance point -- some time invested vs some gain received.
The question becomes whether the gain received does balance the time
invested, and only you can answer that question.

Bob Menschel



Re: it's getting worse again

Posted by Kris Deugau <kd...@vianet.ca>.
Robert Menschel has already addressed most of your points pretty well.
<AOL>MEE too!!oneone!1!!</AOL>

Florin Andrei wrote:
> <sigh>
> 
> I've a fairly demanding job, i've a few pretty convoluted personal
> projects i'm involved in, i've a family and other details that
> typically show up if one is not an archetypal pale-faced
> geek-in-the-basement.  I do try to take care of my personal webserver
> (to which i'm the sole admin), mailserver (SA, Postfix, Cyrus,
> Squirrelmail), VoIP PBX, etc., despite the schedule overload.

Understood.  I happen to be in the position of doing this as a part of
my day job;  I administer a number of local servers for what used to be
a local ISP (bought out two years ago).

As I noted, however, I don't currently spend a whole lot of time
specifically tuning SA, because I've got a well-tuned setup (on the
ISP-account filter server and domain hosting server, according to
customers;  and on my own personal system) that needs all of about five
minutes attention to SA per *week*.  I've left all three systems running
SA 2.64, patched for SURBL lookups, because of this- I have no real need
to upgrade.

That said, all of those systems have been in more or less continuous
operation for several years now, and have had the benefit of quite a bit
of my time doing the tuning since I installed SA.  I'm also seeing far
less customer feedback;  whether that's due to lack of FPs and FNs on
most accounts (possible) or just nobody noticing (somewhat more likely,
sadly) I can't say.

Any good filter *will* take some time to get well-tuned for YOUR
particular mail flow.  :/

> And these days i was looking at SA and i'm, like, "it's not gonna
> happen, i don't have time for this." I chose to play the dumb user on
> purpose, just because i can't fix everything myself.

I know that feeling.  <g>

> I do apologize for not reporting the actual nature of the problem.

Ranting is allowed.  <g>  But if you really expect help, a brief summary
of what you think is wrong and what you've tried to fix the problem lets
others provide advice that may allow you to spend five minutes making a
VERY noticeable improvement in your setup.

Tweaking the Bayes and SURBL (aka URIRBL) scores will probably give you
the most visible, immediate improvement in your spam detection rates
with SA3.x without having to write or test rules or rulesets.

-kgd
-- 
Get your mouse off of there!  You don't know where that email has been!

Re: it's getting worse again

Posted by Florin Andrei <fl...@andrei.myip.org>.
On Wed, 2005-04-06 at 15:53 -0400, Kris Deugau wrote:

> This WILL HAPPEN if you rely entirely on static rules - spammers adjust
> their tactics to avoid those rules.  That's why dynamic rules or systems
> such as Bayes and SURBL are so important.

I religiously feed false negatives back into Bayes. I've a cron job
that's polling a special folder in my IMAP account (i wrote a Perl
script based on a CPAN IMAP module) and i just drag the spam there and
forget about it.

> The most common detail in most other reports like yours (you don't say
> much beyond "It's broke.  Fix it.")

Increasing number of false negatives.

> is that spam is hitting BAYES_99....
> and nothing else.  In 2.6x, this wasn't a problem, BAYES_99 scored over
> the threshold of 5 in the default setup, and spam would be correctly
> tagged in that case.  With 3.x, the BAYES_nn scores have been rather
> reduced, and a number of people have reported good results from just
> copying the 2.64 BAYES_nn scores.

So what is the reason why BAYES_99 is scored so low?

> I'm a little puzzled what you're asking for, then;  addon rulesets are
> available from SARE, and somewhere there's a tool to automatically check
> for updates on those rules.

My impression, after a quick perusal, was that any mentions about SARE
and the like are pretty well "hidden" on the SA main website.
Yes, there is a mention, but there's a big fat "Use at your own risk"
warning at the top of the page. What would a new user think?

> If you're really not interested in tweaking your SA setup

<sigh>

I've a fairly demanding job, i've a few pretty convoluted personal
projects i'm involved in, i've a family and other details that typically
show up if one is not an archetypal pale-faced geek-in-the-basement.
I do try to take care of my personal webserver (to which i'm the sole
admin), mailserver (SA, Postfix, Cyrus, Squirrelmail), VoIP PBX, etc.,
despite the schedule overload.

And these days i was looking at SA and i'm, like, "it's not gonna
happen, i don't have time for this." I chose to play the dumb user on
purpose, just because i can't fix everything myself.

I do apologize for not reporting the actual nature of the problem.

-- 
Florin Andrei

http://florin.myip.org/


Re: it's getting worse again

Posted by Kris Deugau <kd...@vianet.ca>.
Florin Andrei wrote:
> I'm using SA since... well, a long time ago, and one thing that i
> noticed was a pattern in the way its efficiency varies: it's pretty
> good soon after a new release, then it gets continuously worse; then
> a new release and all of a sudden it's good again, then it starts
> "decaying" again...

I noticed this for several releases up to the 2.4x series; and to a
lesser degree into 2.5x and 2.6x.  However, I've reached a fairly stable
state with 2.64 (with the SpamCopURI "plugin"/patch) where I see maybe
two or three spams a week slipping through - at most.  I move those
messages to a "missed-spam" folder, and sa-learn that folder manually
every so often.

Bayes and the SURBL checks have *REALLY* made a noticeable difference in
long-term accuracy.  If it weren't for SURBL alone, actually, I probably
would have upgraded to 3.x by now.  I also happen to maintain a
"local-use-only" DNS zone that I refer to with the SURBL check;  but I
haven't added anything to it in several months.

> Well, it's been a while since the last release, and it's already
> noticeably worse. I know this has been discussed before, i am aware
> of the VirusScannerTypeUpdates FAQ entry, but you know what, from an
> end-user's point of view, it does not matter. All that matters is
> that, despite brilliant technical discussions, the efficiency is
> going down and, if a new version is not released soon enough, the
> users start to complain. This is what's happening right now.

This WILL HAPPEN if you rely entirely on static rules - spammers adjust
their tactics to avoid those rules.  That's why dynamic rules or systems
such as Bayes and SURBL are so important.  The program and rules
themselves don't have to change;  just the data source they work with. 
Manual feedback is NECESSARY for a well-adjusted Bayes system;  without
that feedback there's no way to guarantee that it won't behave
incorrectly on your email stream.

The SA devs could, in theory, release updated rules much more
quickly...  but then they'd be spending most of their time maintaining
and creating new rules, then going through the score-balancing process
to maximize spam detection while minimizing FPs across the official
ruleset - this is a much faster process these days, but it's still a
week-long process IIRC.  (As compared to ~6 weeks up until ~2.63 IIRC.)

The most common detail in most other reports like yours (you don't say
much beyond "It's broke.  Fix it.") is that spam is hitting BAYES_99....
and nothing else.  In 2.6x, this wasn't a problem, BAYES_99 scored over
the threshold of 5 in the default setup, and spam would be correctly
tagged in that case.  With 3.x, the BAYES_nn scores have been rather
reduced, and a number of people have reported good results from just
copying the 2.64 BAYES_nn scores.

> I guess something has to change. "Then change it yourself" type of
> advices will go straight to /dev/null, thank you, because as far as
> SA is concerned, i'm just a user. I am merely pointing out the
> problem.

I'm a little puzzled what you're asking for, then;  addon rulesets are
available from SARE, and somewhere there's a tool to automatically check
for updates on those rules.  ISP mail administrators should at least be
able to whitelist/blacklist email addresses (or provide a way for users
to do so for themselves), and better ones will have a way for users to
submit missed spam or FPs back to be whitelisted/blacklisted/learned by
Bayes/manually poked for possible local rules.

The core SA development team spends more time developing the code that
dissects the message and pulls out specific parts;  with 3.x anyone can
now (more) easily add more complex "rules" that aren't "just" simple
pattern matching but do things like counting occurrences of words or
letters - or more complex checks.  Quite a few SA "rules" rely on code
like this;  that code *can't* be quickly updated in the same way that
the SARE rulesets (for instance) can.

If you're really not interested in tweaking your SA setup, look into a
mail client with its own spam filter - Netscape/Mozilla of recent
versions have one that's pretty good, Apple Mail is supposedly pretty
good, IIRC KMail has one.  But ANY spam filter needs feedback on whether
the filter is working correctly - in the case of a mail program, it's
usually a few mouse clicks compared to the regex tweaking or arcane
command line magic required for SA.

If you're not the administrator of the system running SA on your mail,
talk to the person/organization that is and complain.

-kgd
-- 
Get your mouse off of there!  You don't know where that email has been!

RE: it's getting worse again

Posted by Don Levey <sp...@the-leveys.us>.
Florin Andrei wrote:
> I'm using SA since... well, a long time ago, and one thing that i
> noticed was a pattern in the way its efficiency varies: it's pretty
> good soon after a new release, then it gets continuously worse; then
> a new release and all of a sudden it's good again, then it starts
> "decaying" again...
>
> Well, it's been a while since the last release, and it's already
> noticeably worse. I know this has been discussed before, i am aware of
> the VirusScannerTypeUpdates FAQ entry, but you know what, from an end-
> user's point of view, it does not matter. All that matters is that,
> despite brilliant technical discussions, the efficiency is going down
> and, if a new version is not released soon enough, the users start to
> complain. This is what's happening right now.
>
> I guess something has to change. "Then change it yourself" type of
> advices will go straight to /dev/null, thank you, because as far as SA
> is concerned, i'm just a user. I am merely pointing out the problem.

How do you mean "getting worse"?
Are you saying that it's suddenly letting through messages that it would
have stopped previously?  Or that the spammers and their obfuscation
techniques are changing and now getting around the rulesets you're using?  I
understand that you're not in a position to make the code changes yourself,
but those that are need the details of your problem in order to be able to
fix it - or even diagnose it.

Perhaps you simply need new rules, or to update the rulesets you're already
using?  I'm not a coder either, but I may start down that road anyway.  Then
again, now that I've finally upgraded to 3.0.2, I'm finding/tagging over 99%
of the spam I'm getting, and blocking outright almost 50% (with a block
threshhold of 18, no less).

 -Don