You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Chris Santerre <cs...@MerchantsOverseas.com> on 2005/01/24 21:55:48 UTC
Quick Review of MIT spam conference
Well it was a nice trip to MIT. Here is the quick and dirty of it:
Caveat: I missed the 2 first presentations. Damn traffic!
No new "WOW!" techniques were introduced. There were some decent data
analysis but nothing that screamed it would flag a large volume of spam. The
techniques that were introduced seemed to be pretty CPU intense to me. IBM
looked to have a solid model but some parts of their system bothered me.
Particularly the ability, at least in their flow chart, to allow people to
train the global bayes DB. Lots of other good stats presented by numerous
people. But that was a lot of the conference, analysis of data.
Some ideas, IMHO, were ridiculous. Had these people posted their papers to
spam-l they would have been nailed to a cross. I truly believe these people
meant well, but had tunnel vision. "Regulation Instead of Stopping"
presented by the guys from Georgia had me biting my lip! The idea that a 3rd
party arbitrator should handle email requests set off so many red flags I
thought I was going to faint.
"Spam Kings" by Brian McWilliams. NICE book! I wrestled my free copy from
the pile. Done reading it by Sunday. Very low on technical explanations, but
good insight into spammers.
"You've Got Jail. Some First Hand Observations from the Jeremy Jaynes Spam
Trial" Jon Praed, Founding Partner, Internet Law Group. Stole the show!
Could have listened to him all day! Finally a lawyer I actually like ;)
Other legislative ideas seemed very flawed. VERY FLAWED. During the French
presentation, they were explaining how much better their laws were and how
they were working with EU and international groups. When someone asked them
how many people had been prosecuted under this fantastic, almost 2 year old,
law, and she answered "None" I think the entire auditorium whispered "NEXT!"
"Using Lexigraphical Distancing to Block Spam" Jonathan Oliver, Director of
Research, MailFrontier, Inc. This has seemed to be an interesting problem in
a lot of the conferences. Private company says they got this technique to
fight spam. But its private and can't go into specific detail on how they do
it. And most likely it is patented. So its not as helpful to the community.
Although I liked the idea, it also felt VERY intensive to me. I could see
their DB getting big very fast. Big whitelist as well!
Bayes, bayes, and more bayes. Everyone seemed to be talking about using
bayes in different ways. I was SO HAPPY to see a few others felt the same
way as I did. Bayes ROCKS for a private email account with a techie kind of
owner. But as a more global/luser solution, it just isn't going to work well
at all. And I don't see it scaling to something like AOL. (Not much does!)
The best part of the trip were the side discussions. Finally meeting people
face to face. Many people have the same thoughts ideas that I have had. For
instance some of the best minds in antispam make ZERO dollars from fighting
spam. I really found that interesting how many antis would like a full time
career in it. This included the discussion of *hypothetically* going to the
dark side. No one seriously considering it, but discussing about how we knew
the weaknesses of the current anti-spam measures. Very interesting stuff.
Frankly I think a day of round table discussion groups would have been even
better then having presentations. Ideas were debated during dinner and that
was a blast!
Also January in Cambridge. I thought antis were smart? :)
I've NEVER seen so many people sleep sitting up!
Tai desserts? I'm still not sure what was in it, but I swear there is a
squirrel somewhere who can't reproduce.
"Is your saki hot?" is a good pickup line.
The MIT train room, is more then a train room. Its Tetris! ;)
As soon as it gets warmer, I owe someone up north a dinner! Warmer!
Hopefully I can make the spam conference on the wrong coa.... I mean West
coast.
How about those Patriots!! Dynasty baby!! :-)
Chris Santerre
System Admin and SARE/SURBL Ninja
http://www.rulesemporium.com
http://www.surbl.org
'It is not the strongest of the species that survives,
not the most intelligent, but the one most responsive to change.'
Charles Darwin
Re: sa-learn
Posted by Jeffrey Lee <je...@reflex8.com>.
Here is an example header:
X-Spam-Status: No, score=3.0 required=5.0 tests=AWL,CELL_PHONE_FREE,
HTML_90_100,HTML_MESSAGE,HTML_TAG_EXIST_TBODY,HTML_TEXT_AFTER_BODY,
HTML_TEXT_AFTER_HTML,HTML_WEB_BUGS,MIME_HTML_ONLY autolearn=no
version=3.0.2
On Jan 26, 2005, at 10:59 AM, Matt Kettler wrote:
> At 11:47 AM 1/26/2005, Jeffrey Lee wrote:
>> I have been using sa-learn religiously with ALL spam and ham on my
>> server. However, I keep getting repeat spam with low scores. How can
>> I increase the sa-learn "points"? So that when I learn a message
>> instead of increasing some point by .1 or .2 it will increase by .5
>> or .6?
>
> Well, sa-learning a message doesn't really work by increasing the
> "points" of a message, although that's more-or-less the net effect.
>
> In short, you'll want to make sure your inbound messages are hitting
> BAYES_90 or higher, and increase the scores of those rules in your
> local.cf.
>
> Also, while you're at it, check for spam messages matching
> ALL_TRUSTED. If that's happening, check the archives on setting
> trusted_networks manually. That rule should *never* match spam but
> will if SA gets confused by your MTA config.
>
> If the spam messages are consistently hitting BAYES_99, sa-learning
> won't increase the score of that message further, but it does help SA
> recognize subtle changes over time in spam. So keep up the training as
> it will keep slight deviations from driving the bayes scores down and
> causing FN problems that way.
>
> When you sa-learn a message, SA learns that the words in that message
> are more likely to be in spam or ham than it previously new. When new
> messages come in, SA looks at it's database of words and calculates a
> spam probability based on the words in that message. It then matches
> that probability to one of the BAYES_* rules and that causes the score
> impact.
>
>
>
>
Re: sa-learn
Posted by Matt Kettler <mk...@evi-inc.com>.
At 12:08 PM 1/26/2005, Jeffrey Lee wrote:
>I understand that. How then does SA treat messages mainly made up of images?
Hmm, in the context of what, bayes?
SA treats messages all in more-or-less the same fashion. embedded image
based spams are only going to wind up matching bayes if the headers or URIs
are part of bayes's header learning. SA's bayes doesn't learn from general
HTML tags.
Really your best tools against image spams are SURBL (for ones that link
external websites), and DCC or razor (for ones with embedded images).
Also, the HTML percentage rules kick in here, but their scores are pretty
low these days.
Re: sa-learn
Posted by Martin Hepworth <ma...@solid-state-logic.com>.
Jeff
in that you are best to use the URI RBLS from surbl.org.
There are also other rules in www.ruleemporium.com/rules.htm that check
for URI/OEM type stuff, if you haven't already got them.
--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300
Jeffrey Lee wrote:
> I understand that. How then does SA treat messages mainly made up of
> images?
>
> On Jan 26, 2005, at 10:59 AM, Matt Kettler wrote:
>
>> At 11:47 AM 1/26/2005, Jeffrey Lee wrote:
>>
>>> I have been using sa-learn religiously with ALL spam and ham on my
>>> server. However, I keep getting repeat spam with low scores. How can
>>> I increase the sa-learn "points"? So that when I learn a message
>>> instead of increasing some point by .1 or .2 it will increase by .5
>>> or .6?
>>
>>
>> Well, sa-learning a message doesn't really work by increasing the
>> "points" of a message, although that's more-or-less the net effect.
>>
>> In short, you'll want to make sure your inbound messages are hitting
>> BAYES_90 or higher, and increase the scores of those rules in your
>> local.cf.
>>
>> Also, while you're at it, check for spam messages matching
>> ALL_TRUSTED. If that's happening, check the archives on setting
>> trusted_networks manually. That rule should *never* match spam but
>> will if SA gets confused by your MTA config.
>>
>> If the spam messages are consistently hitting BAYES_99, sa-learning
>> won't increase the score of that message further, but it does help SA
>> recognize subtle changes over time in spam. So keep up the training as
>> it will keep slight deviations from driving the bayes scores down and
>> causing FN problems that way.
>>
>> When you sa-learn a message, SA learns that the words in that message
>> are more likely to be in spam or ham than it previously new. When new
>> messages come in, SA looks at it's database of words and calculates a
>> spam probability based on the words in that message. It then matches
>> that probability to one of the BAYES_* rules and that causes the score
>> impact.
>>
>>
>>
>>
>
**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.
**********************************************************************
Re: sa-learn
Posted by Jeffrey Lee <je...@reflex8.com>.
I understand that. How then does SA treat messages mainly made up of
images?
On Jan 26, 2005, at 10:59 AM, Matt Kettler wrote:
> At 11:47 AM 1/26/2005, Jeffrey Lee wrote:
>> I have been using sa-learn religiously with ALL spam and ham on my
>> server. However, I keep getting repeat spam with low scores. How can
>> I increase the sa-learn "points"? So that when I learn a message
>> instead of increasing some point by .1 or .2 it will increase by .5
>> or .6?
>
> Well, sa-learning a message doesn't really work by increasing the
> "points" of a message, although that's more-or-less the net effect.
>
> In short, you'll want to make sure your inbound messages are hitting
> BAYES_90 or higher, and increase the scores of those rules in your
> local.cf.
>
> Also, while you're at it, check for spam messages matching
> ALL_TRUSTED. If that's happening, check the archives on setting
> trusted_networks manually. That rule should *never* match spam but
> will if SA gets confused by your MTA config.
>
> If the spam messages are consistently hitting BAYES_99, sa-learning
> won't increase the score of that message further, but it does help SA
> recognize subtle changes over time in spam. So keep up the training as
> it will keep slight deviations from driving the bayes scores down and
> causing FN problems that way.
>
> When you sa-learn a message, SA learns that the words in that message
> are more likely to be in spam or ham than it previously new. When new
> messages come in, SA looks at it's database of words and calculates a
> spam probability based on the words in that message. It then matches
> that probability to one of the BAYES_* rules and that causes the score
> impact.
>
>
>
>
Re: sa-learn
Posted by Matt Kettler <mk...@evi-inc.com>.
At 11:47 AM 1/26/2005, Jeffrey Lee wrote:
>I have been using sa-learn religiously with ALL spam and ham on my server.
>However, I keep getting repeat spam with low scores. How can I increase
>the sa-learn "points"? So that when I learn a message instead of
>increasing some point by .1 or .2 it will increase by .5 or .6?
Well, sa-learning a message doesn't really work by increasing the "points"
of a message, although that's more-or-less the net effect.
In short, you'll want to make sure your inbound messages are hitting
BAYES_90 or higher, and increase the scores of those rules in your local.cf.
Also, while you're at it, check for spam messages matching ALL_TRUSTED. If
that's happening, check the archives on setting trusted_networks manually.
That rule should *never* match spam but will if SA gets confused by your
MTA config.
If the spam messages are consistently hitting BAYES_99, sa-learning won't
increase the score of that message further, but it does help SA recognize
subtle changes over time in spam. So keep up the training as it will keep
slight deviations from driving the bayes scores down and causing FN
problems that way.
When you sa-learn a message, SA learns that the words in that message are
more likely to be in spam or ham than it previously new. When new messages
come in, SA looks at it's database of words and calculates a spam
probability based on the words in that message. It then matches that
probability to one of the BAYES_* rules and that causes the score impact.
sa-learn
Posted by Jeffrey Lee <je...@reflex8.com>.
I have been using sa-learn religiously with ALL spam and ham on my
server. However, I keep getting repeat spam with low scores. How can I
increase the sa-learn "points"? So that when I learn a message instead
of increasing some point by .1 or .2 it will increase by .5 or .6?
Thanks,
Jeffrey Lee