You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Zulma Pape <pa...@gmail.com> on 2018/01/23 10:55:46 UTC

Using Cloud AutoML as an AI for an Anti-spam filter ?

Hi,

I have just read about the Cloud AutoML and how Google made it possible for
users to train their own custom machine learning algorithms from scratch.

So the first thing that I got in my mind is, can we use this service to
build our own Spam Filter based on the users experience.

In other words, can we integrate the Cloud AutoML into our server's spam
filter and make it behave the same way Gmail behave ?

Thank you

Re: Using Cloud AutoML as an AI for an Anti-spam filter ?

Posted by Benny Pedersen <me...@junc.eu>.
Zulma Pape skrev den 2018-01-23 11:55:

> I have just read about the Cloud AutoML and how Google made it
> possible for users to train their own custom machine learning
> algorithms from scratch.

+1

> So the first thing that I got in my mind is, can we use this service
> to build our own Spam Filter based on the users experience.

dspam already do this, nothing beats bayes engines

> In other words, can we integrate the Cloud AutoML into our server's
> spam filter and make it behave the same way Gmail behave ?

if thats possible i like to build a lotto kupon with only winning 
numbers :)

Re: Using Cloud AutoML as an AI for an Anti-spam filter ?

Posted by John Hardin <jh...@impsec.org>.
On Tue, 23 Jan 2018, Dave Warren wrote:

> On Tue, Jan 23, 2018, at 02:55, Zulma Pape wrote:
>> In other words, can we integrate the Cloud AutoML into our server's
>> spam filter and make it behave the same way Gmail behave ?
> In short, not without a *lot* of work.
>
> Gmail implements a lot more complexity, and they have a lot more data
> than you. One example is that they track user interaction with email,
> things like what messages does a user delete without reading, what
> messages are opened and for how long, are links clicked, replies
> generated, etc.
> They also have a very wide view of all the email around the world, and
> therefore are very likely to spot new botnets, changes in spammer
> techniques, and also changes in legitimate mail far faster than almost
> anyone else.

<rant type="RCOB">And yet they (google/gmail) still bounce spam/phish 
evidence attachments sent to abuse@ mailboxes they host.</rant>

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Individual liberties are always "loopholes" to absolute authority.
-----------------------------------------------------------------------
  Today: John Moses Browning's 163rd Birthday

Re: Using Cloud AutoML as an AI for an Anti-spam filter ?

Posted by Dave Warren <dw...@thedave.ca>.
On Tue, Jan 23, 2018, at 02:55, Zulma Pape wrote:
> In other words, can we integrate the Cloud AutoML into our server's
> spam filter and make it behave the same way Gmail behave ?
In short, not without a *lot* of work.

Gmail implements a lot more complexity, and they have a lot more data
than you. One example is that they track user interaction with email,
things like what messages does a user delete without reading, what
messages are opened and for how long, are links clicked, replies
generated, etc.
They also have a very wide view of all the email around the world, and
therefore are very likely to spot new botnets, changes in spammer
techniques, and also changes in legitimate mail far faster than almost
anyone else.
Bayesian is good, per-user bayesian is better, but Gmail can build
bayesian databases without the user's help simply based on their
activity combined with generalized multiple user filters. They can also
use this type of learning to split out mailing lists, receipts,
advertising, scams and others in a general sense, and then apply some
logic to determine if this particular user is likely receptive to the
classifications of messages.
You could reproduce all of this to the best of your data, but you also
need a relatively massive dataset and ability to collect a lot of
details about your user activity to really make it work.
On the other hand, you can make unilateral decisions under the "my
server, my rules" policy to customize and tweak your own filters in a
way that Google cannot.


Re: Using Cloud AutoML as an AI for an Anti-spam filter ?

Posted by Martin Gregorie <ma...@gregorie.org>.
On Tue, 2018-01-23 at 10:55 +0000, Zulma Pape wrote:
> Hi,
> 
> I have just read about the Cloud AutoML and how Google made it
> possible for users to train their own custom machine learning
> algorithms from scratch.
> 
That's very unlikely. What Google have released is a tool for training
their image recognition system framework. Here's The Register's article
about it:  https://www.theregister.co.uk/2018/01/18/google_automl/

TL:DR - if you build your own machine learning system on Google's ML
framework and libraries and customise it to to parse e-mails instead of
scanning image files, then you'll be in a position to modify Google's
AutoML tool so it can train your ML system to distinguish ham from
spam. 

When all that's working you won't need Spamassassin any more because
your e-mail parsing AI will be doing everything you need.

 
Martin



Re: Using Cloud AutoML as an AI for an Anti-spam filter ?

Posted by "Kevin A. McGrail" <ke...@mcgrail.com>.
On 1/23/2018 5:55 AM, Zulma Pape wrote:
>
> I have just read about the Cloud AutoML and how Google made it 
> possible for users to train their own custom machine learning 
> algorithms from scratch.
>
> So the first thing that I got in my mind is, can we use this service 
> to build our own Spam Filter based on the users experience.
>
> In other words, can we integrate the Cloud AutoML into our server's 
> spam filter and make it behave the same way Gmail behave ?

Of course you can.  Build a corpus of ham and spam and use it to train.

ML really isn't "new", nor is "bigdata" to spam analysis.  Just new 
names for the existing items sometimes.  SA has been during Bayesian 
automatic classification and training with distributed computing through 
masscheck/ruleqa for about 2 decades.  In fact, I would say anti-spam is 
one of the single best black or white decision making processes for 
anti-spam.  Most of my research focuses on shading that decision because 
it's not always that easy or the same for different people.

Anyway, Good training is critical just so focus on your corpora!

Regards,

KAM