You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by joea <jo...@j4computers.com> on 2012/04/13 00:23:30 UTC

auto add spam/ham for manual learning

I see where one can forward mail to location and have spamassassin scan them.  For both spam and ham I gather.

Wondering what settings to change to have it ignore any additional info the forward adds.   Yes, I am looking for a bit of spoon feeding here, as, well, it's been along day.


Re: auto add spam/ham for manual learning

Posted by John Hardin <jh...@impsec.org>.
On Fri, 13 Apr 2012, Kris Deugau wrote:

> John Hardin wrote:
>>  The best you can do if you're doing forwarding for training, is to
>>  require that the original ham/spam be forwarded as an RFC822 attachment,
>
> Outlook is not consistently capable of doing this correctly - worse, the 
> behaviour changes depending on whether you're forwarding one message ore 
> more than one.  O_o

YGBFKM. But then, it's Outhouse, so I shouldn't be surprised.

> Either way, I wouldn't recommend auto-training on user-submitted mail unless 
> you have a small userbase you can talk to individually, in person.

...yeah, there's that, too. "Your users are out to destroy your network" 
is a healthy admin attitude, not paranioa. :)

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   A well educated Electorate, being necessary to the liberty of a
   free State, the Right of the People to Keep and Read Books,
   shall not be infringed.
-----------------------------------------------------------------------
  Today: Thomas Jefferson's 269th Birthday

Re: auto add spam/ham for manual learning

Posted by Kris Deugau <kd...@vianet.ca>.
John Hardin wrote:
> The best you can do if you're doing forwarding for training, is to
> require that the original ham/spam be forwarded as an RFC822 attachment,

Some users with sane mail clients *can* be trained to do this - you just 
have to find the right instructions.

If they're all using Outlook, set up shared folders.  Outlook is not 
consistently capable of doing this correctly - worse, the behaviour 
changes depending on whether you're forwarding one message ore more than 
one.  O_o

> and then have a mailbox preprocessing step that extracts the attachments
> and saves them in the "real" learning mail folders. This doesn't
> guarantee no lost or altered data, but it will minimize it. It's also
> possibly more technically advanced than your users will be comfortable
> with.

Another handy way to make this happen is to install a webmail suite that 
supports a "Report spam" option.  This lets users Do The Right Thing 
without having to do it by hand.

> If your mail server supports it, a better way is to define a couple of
> public/shared mail folders (e.g. public IMAP folders) and have people
> move missed spams and copy misclassified hams from their private folders
> to those public folders. This will avoid the changes made to the message
> from being forwarded. Then train from those folders.

Either way, I wouldn't recommend auto-training on user-submitted mail 
unless you have a small userbase you can talk to individually, in 
person.  I have a handful of customers who don't seem to read the 
replies I send occasionally responding to their reporting their 
all-legitimate inbox as spam - I've regularly seen those same replies 
reported as spam.  :(

-kgd

Re: auto add spam/ham for manual learning

Posted by John Hardin <jh...@impsec.org>.
On Thu, 12 Apr 2012, joea wrote:

> I see where one can forward mail to location and have spamassassin scan them.  For both spam and ham I gather.
>
> Wondering what settings to change to have it ignore any additional info the forward adds.   Yes, I am looking for a bit of spoon feeding here, as, well, it's been along day.

Forwards are unavoidably mangled and/or abridged as part of the forwarding 
process. There are no settings to undo that.

The best you can do if you're doing forwarding for training, is to require 
that the original ham/spam be forwarded as an RFC822 attachment, and then 
have a mailbox preprocessing step that extracts the attachments and saves 
them in the "real" learning mail folders. This doesn't guarantee no lost 
or altered data, but it will minimize it. It's also possibly more 
technically advanced than your users will be comfortable with.

If your mail server supports it, a better way is to define a couple of 
public/shared mail folders (e.g. public IMAP folders) and have people move 
missed spams and copy misclassified hams from their private folders to 
those public folders. This will avoid the changes made to the message from 
being forwarded. Then train from those folders.

If you do that, you probably don't want to leave messages sitting in the 
shared ham folder; move them to a private-to-admin folder.

Either way, I don't recommend deleting any messages that were submitted 
for training - in other words, keep your training corpora. Having your 
training corpora allows you to troubleshoot and correct accidental (or 
malicious) mistraining, and allows you to wipe and retrain from scratch if 
you need to (e.g. if you lose your entire Bayes database for some reason).

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Gun Control laws cannot reduce violent crime, because gun control
   laws focus obsessively on a tool a criminal might use to commit a
   crime rather than the criminal himself and his act of violence.
-----------------------------------------------------------------------
  Tomorrow: Thomas Jefferson's 269th Birthday