You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Chris Santerre <cs...@MerchantsOverseas.com> on 2005/01/24 21:55:48 UTC

Quick Review of MIT spam conference

Well it was a nice trip to MIT. Here is the quick and dirty of it:

Caveat: I missed the 2 first presentations. Damn traffic! 

No new "WOW!" techniques were introduced. There were some decent data
analysis but nothing that screamed it would flag a large volume of spam. The
techniques that were introduced seemed to be pretty CPU intense to me. IBM
looked to have a solid model but some parts of their system bothered me.
Particularly the ability, at least in their flow chart, to allow people to
train the global bayes DB. Lots of other good stats presented by numerous
people. But that was a lot of the conference, analysis of data. 

Some ideas, IMHO, were ridiculous. Had these people posted their papers to
spam-l they would have been nailed to a cross. I truly believe these people
meant well, but had tunnel vision. "Regulation Instead of Stopping"
presented by the guys from Georgia had me biting my lip! The idea that a 3rd
party arbitrator should handle email requests set off so many red flags I
thought I was going to faint. 

"Spam Kings" by Brian McWilliams. NICE book! I wrestled my free copy from
the pile. Done reading it by Sunday. Very low on technical explanations, but
good insight into spammers. 

"You've Got Jail. Some First Hand Observations from the Jeremy Jaynes Spam
Trial" Jon Praed, Founding Partner, Internet Law Group. Stole the show!
Could have listened to him all day! Finally a lawyer I actually like ;)  

Other legislative ideas seemed very flawed. VERY FLAWED. During the French
presentation, they were explaining how much better their laws were and how
they were working with EU and international groups. When someone asked them
how many people had been prosecuted under this fantastic, almost 2 year old,
law, and she answered "None" I think the entire auditorium whispered "NEXT!"


"Using Lexigraphical Distancing to Block Spam" Jonathan Oliver, Director of
Research, MailFrontier, Inc. This has seemed to be an interesting problem in
a lot of the conferences. Private company says they got this technique to
fight spam. But its private and can't go into specific detail on how they do
it. And most likely it is patented. So its not as helpful to the community.
Although I liked the idea, it also felt VERY intensive to me. I could see
their DB getting big very fast. Big whitelist as well!

Bayes, bayes, and more bayes. Everyone seemed to be talking about using
bayes in different ways. I was SO HAPPY to see a few others felt the same
way as I did. Bayes ROCKS for a private email account with a techie kind of
owner. But as a more global/luser solution, it just isn't going to work well
at all. And I don't see it scaling to something like AOL. (Not much does!)

The best part of the trip were the side discussions. Finally meeting people
face to face. Many people have the same thoughts ideas that I have had. For
instance some of the best minds in antispam make ZERO dollars from fighting
spam. I really found that interesting how many antis would like a full time
career in it. This included the discussion of *hypothetically* going to the
dark side. No one seriously considering it, but discussing about how we knew
the weaknesses of the current anti-spam measures. Very interesting stuff. 

Frankly I think a day of round table discussion groups would have been even
better then having presentations. Ideas were debated during dinner and that
was a blast!

Also January in Cambridge. I thought antis were smart? :) 
I've NEVER seen so many people sleep sitting up!
Tai desserts? I'm still not sure what was in it, but I swear there is a
squirrel somewhere who can't reproduce. 
"Is your saki hot?" is a good pickup line.
The MIT train room, is more then a train room. Its Tetris! ;) 
As soon as it gets warmer, I owe someone up north a dinner! Warmer!

Hopefully I can make the spam conference on the wrong coa.... I mean West
coast. 

How about those Patriots!! Dynasty baby!! :-) 

Chris Santerre 
System Admin and SARE/SURBL Ninja
http://www.rulesemporium.com
http://www.surbl.org
'It is not the strongest of the species that survives,
not the most intelligent, but the one most responsive to change.'
Charles Darwin

Re: sa-learn

Posted by Jeffrey Lee <je...@reflex8.com>.

Here is an example header:

X-Spam-Status: No, score=3.0 required=5.0 tests=AWL,CELL_PHONE_FREE, 
HTML_90_100,HTML_MESSAGE,HTML_TAG_EXIST_TBODY,HTML_TEXT_AFTER_BODY, 
HTML_TEXT_AFTER_HTML,HTML_WEB_BUGS,MIME_HTML_ONLY autolearn=no  
version=3.0.2

On Jan 26, 2005, at 10:59 AM, Matt Kettler wrote:

> At 11:47 AM 1/26/2005, Jeffrey Lee wrote:
>> I have been using sa-learn religiously with ALL spam and ham on my 
>> server. However, I keep getting repeat spam with low scores. How can 
>> I increase the sa-learn "points"? So that when I learn a message 
>> instead of increasing some point by .1 or .2 it will increase by .5 
>> or .6?
>
> Well, sa-learning a message doesn't really work by increasing the 
> "points" of a message, although that's more-or-less the net effect.
>
> In short, you'll want to make sure your inbound messages are hitting 
> BAYES_90 or higher, and increase the scores of those rules in your 
> local.cf.
>
> Also, while you're at it, check for spam messages matching 
> ALL_TRUSTED. If that's happening, check the archives on setting 
> trusted_networks manually. That rule should *never* match spam but 
> will if SA gets confused by your MTA config.
>
> If the spam messages are consistently hitting BAYES_99, sa-learning 
> won't increase the score of that message further, but it does help SA 
> recognize subtle changes over time in spam. So keep up the training as 
> it will keep slight deviations from driving the bayes scores down and 
> causing FN problems that way.
>
> When you sa-learn a message, SA learns that the words in that message 
> are more likely to be in spam or ham than it previously new. When new 
> messages come in, SA looks at it's database of words and calculates a 
> spam probability based on the words in that message. It then matches 
> that probability to one of the BAYES_* rules and that causes the score 
> impact.
>
>
>
>

Re: sa-learn

Posted by Matt Kettler <mk...@evi-inc.com>.

At 12:08 PM 1/26/2005, Jeffrey Lee wrote:
>I understand that. How then does SA treat messages mainly made up of images?

Hmm, in the context of what, bayes?

SA treats messages all in more-or-less the same fashion. embedded image 
based spams are only going to wind up matching bayes if the headers or URIs 
are part of bayes's header learning. SA's bayes doesn't learn from general 
HTML tags.

Really your best tools against image spams are SURBL (for ones that link 
external websites), and DCC or razor (for ones with embedded images).

Also, the HTML percentage rules kick in here, but their scores are pretty 
low these days.

Re: sa-learn

Posted by Martin Hepworth <ma...@solid-state-logic.com>.

Jeff

in that you are best to use the URI RBLS from surbl.org.

There are also other rules in www.ruleemporium.com/rules.htm that check 
for URI/OEM type stuff, if you haven't already got them.

--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300


Jeffrey Lee wrote:
> I understand that. How then does SA treat messages mainly made up of 
> images?
> 
> On Jan 26, 2005, at 10:59 AM, Matt Kettler wrote:
> 
>> At 11:47 AM 1/26/2005, Jeffrey Lee wrote:
>>
>>> I have been using sa-learn religiously with ALL spam and ham on my 
>>> server. However, I keep getting repeat spam with low scores. How can 
>>> I increase the sa-learn "points"? So that when I learn a message 
>>> instead of increasing some point by .1 or .2 it will increase by .5 
>>> or .6?
>>
>>
>> Well, sa-learning a message doesn't really work by increasing the 
>> "points" of a message, although that's more-or-less the net effect.
>>
>> In short, you'll want to make sure your inbound messages are hitting 
>> BAYES_90 or higher, and increase the scores of those rules in your 
>> local.cf.
>>
>> Also, while you're at it, check for spam messages matching 
>> ALL_TRUSTED. If that's happening, check the archives on setting 
>> trusted_networks manually. That rule should *never* match spam but 
>> will if SA gets confused by your MTA config.
>>
>> If the spam messages are consistently hitting BAYES_99, sa-learning 
>> won't increase the score of that message further, but it does help SA 
>> recognize subtle changes over time in spam. So keep up the training as 
>> it will keep slight deviations from driving the bayes scores down and 
>> causing FN problems that way.
>>
>> When you sa-learn a message, SA learns that the words in that message 
>> are more likely to be in spam or ham than it previously new. When new 
>> messages come in, SA looks at it's database of words and calculates a 
>> spam probability based on the words in that message. It then matches 
>> that probability to one of the BAYES_* rules and that causes the score 
>> impact.
>>
>>
>>
>>
> 

**********************************************************************

This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.

**********************************************************************

Re: sa-learn

Posted by Jeffrey Lee <je...@reflex8.com>.

I understand that. How then does SA treat messages mainly made up of 
images?

On Jan 26, 2005, at 10:59 AM, Matt Kettler wrote:

> At 11:47 AM 1/26/2005, Jeffrey Lee wrote:
>> I have been using sa-learn religiously with ALL spam and ham on my 
>> server. However, I keep getting repeat spam with low scores. How can 
>> I increase the sa-learn "points"? So that when I learn a message 
>> instead of increasing some point by .1 or .2 it will increase by .5 
>> or .6?
>
> Well, sa-learning a message doesn't really work by increasing the 
> "points" of a message, although that's more-or-less the net effect.
>
> In short, you'll want to make sure your inbound messages are hitting 
> BAYES_90 or higher, and increase the scores of those rules in your 
> local.cf.
>
> Also, while you're at it, check for spam messages matching 
> ALL_TRUSTED. If that's happening, check the archives on setting 
> trusted_networks manually. That rule should *never* match spam but 
> will if SA gets confused by your MTA config.
>
> If the spam messages are consistently hitting BAYES_99, sa-learning 
> won't increase the score of that message further, but it does help SA 
> recognize subtle changes over time in spam. So keep up the training as 
> it will keep slight deviations from driving the bayes scores down and 
> causing FN problems that way.
>
> When you sa-learn a message, SA learns that the words in that message 
> are more likely to be in spam or ham than it previously new. When new 
> messages come in, SA looks at it's database of words and calculates a 
> spam probability based on the words in that message. It then matches 
> that probability to one of the BAYES_* rules and that causes the score 
> impact.
>
>
>
>

Re: sa-learn

Posted by Matt Kettler <mk...@evi-inc.com>.

At 11:47 AM 1/26/2005, Jeffrey Lee wrote:
>I have been using sa-learn religiously with ALL spam and ham on my server. 
>However, I keep getting repeat spam with low scores. How can I increase 
>the sa-learn "points"? So that when I learn a message instead of 
>increasing some point by .1 or .2 it will increase by .5 or .6?

Well, sa-learning a message doesn't really work by increasing the "points" 
of a message, although that's more-or-less the net effect.

In short, you'll want to make sure your inbound messages are hitting 
BAYES_90 or higher, and increase the scores of those rules in your local.cf.

Also, while you're at it, check for spam messages matching ALL_TRUSTED. If 
that's happening, check the archives on setting trusted_networks manually. 
That rule should *never* match spam but will if SA gets confused by your 
MTA config.

If the spam messages are consistently hitting BAYES_99, sa-learning won't 
increase the score of that message further, but it does help SA recognize 
subtle changes over time in spam. So keep up the training as it will keep 
slight deviations from driving the bayes scores down and causing FN 
problems that way.

When you sa-learn a message, SA learns that the words in that message are 
more likely to be in spam or ham than it previously new. When new messages 
come in, SA looks at it's database of words and calculates a spam 
probability based on the words in that message. It then matches that 
probability to one of the BAYES_* rules and that causes the score impact.

sa-learn

Posted by Jeffrey Lee <je...@reflex8.com>.

I have been using sa-learn religiously with ALL spam and ham on my 
server. However, I keep getting repeat spam with low scores. How can I 
increase the sa-learn "points"? So that when I learn a message instead 
of increasing some point by .1 or .2 it will increase by .5 or .6?

Thanks,
Jeffrey Lee